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designs  currently  available.  Those  that  have  been  catalogued  are  usually  computationally 
expensive  to  produce  and  have  severe  restrictions  in  the  number  of  factors  and/or  runs 
that  they  allow.  To  remedy  this,  we  present  a  set  of  flexible  methodologies  to  create 
design  matrices  with  little  or  no  correlation — including  saturated  nearly  orthogonal  Latin 
hypercubes.  This  new  family  of  designs  can  explore  as  many  factors  as  there  are  design 
points.  This  research  also  addresses  experiments  that  include  a  mixture  of  continuous 
and  integer  variables,  some  of  which  have  different  numbers  of  value  levels. 


v 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


vi 


TABLE  OF  CONTENTS 


I. 


II. 


INTRODUCTION . 1 

A.  CHALLENGES  IN  EXPERIMENTAL  DESIGNS  FOR 

SIMULATIONS . 1 

B.  RANDOM  LATIN  HYPERCUBES  AND  INHERENT 

CORRELATION . 4 

1.  Random  Latin  Hypercube  Generation . 5 

2.  Measure  of  Nonorthogonality  for  Design  Selection . 6 

C.  A  PROGRESSION  OF  OLH  AND  NOLH  CONSTRUCTION 

METHODS . 8 

D.  DISSERTATION  ORGANIZATION . 11 


SHAPING  CONDITIONS  FOR  APPLYING  UNCORRECTED  RANDOM 

LATIN  HYPERCUBES . 15 

A.  TOOLS  FOR  DIMENSION  SELECTION  AND  RANDOM  LATIN 

HYPERCUBE  GENERATION . 16 

1.  Creating  the  /?”“  Table . 16 


2.  A  Function  to  Predict  pffp  . 20 

a.  Initial  Analysis  of  pf" . 20 

b.  Exploring  Simple  Linear  Regression  Models  for  p'™" . 24 

c.  A  Multiple  Linear  Regression  Model  for  pffp . 24 

d.  Adequacy  of  the  Multiple  Linear  Regression  Model  for 

^min 

P map  . 25 

/ - ~\E 

e.  Application  of  the  (p%)  mm . 26 


B. 


C. 

D. 


DESIGN  DEVELOPMENT  WITH  CORRELATION-BASED 


TOOLS . 27 

1.  The  Process  for  Dimension  Selection  and  RLH  Design 

Generation . 27 

2.  A  Hypothetical  Case  Requiring  an  Uncatalogued  Design 

Dimension . 28 

EFFECTS  OF  FEWER  CANDIDATE  DESIGNS  (G) . 30 

SUMMARY . 32 


III.  CORRECTED  RLH  USING  FLORIAN’S  METHOD  FOR  NEW  DESIGNS ...33 


A.  FLORIAN’S  CORRELATION  REDUCTION  METHOD . 33 

B.  DIMENSIONALITY  AND  FLORIAN’S  METHOD . 34 

C.  CONSTRUCTING  NEARLY  ORTHOGONAL  LATIN 

HYPERCUBES . 37 

D.  SUMMARY . 39 


vii 


IV.  A  MIXED  INTEGER  PROGRAMMING  APPROACH  TO  MINIMIZE  pmap  ..41 

A.  SIMPLIFYING  THE  COMPUTATION  FOR 

NONORTHOGONALITY . 42 

B.  A  LINEAR  FORMULATION  OF  THE  MATHEMATICAL  MODEL...42 

1.  An  Initial  Mathematical  Model . 43 

2.  Transformation  of  a  Nonlinear  Optimization  Problem . 44 

3.  A  Complete  Linear  Formulation  of  the  Optimization  Problem  ....45 

C.  OPERATIONALIZING  THE  LINEAR  OPTIMIZATION 

PROBLEM . 47 

1.  The  Size  and  Complexity  of  VMIN . 48 

2.  Applying  RLH  and  Florian’s  Method  to  Initiate  VMIN . 51 

3.  Iterative  Application  of  VMIN . 55 

D.  NEW  DESIGNS  FROM  VMIN . 56 

E.  SUMMARY . 66 

V.  MANAGING  MIXED-FACTOR,  MIXED-LEVEL  EXPERIMENTS  WITH 

A  NEW  DESIGN  METHODOLOGY . 69 

A.  THE  BASE  CONTINUOUS  DESIGN . 69 

B.  DISCRETE  INTEGER  VARIABLES  AND  STACKING 

METHODOLOGY . 70 

1.  One  Subset  of  Discrete  Variables  with  a  Common  Value  Level....70 

2.  Addressing  More  Than  One  Subset  of  Discrete  Variables . 71 

3.  Formalized  Stacking  Methodologies . 72 

C.  COMBINED  AND  STACKING  METHODOLOGIES . 75 

1.  Steps  for  Creating  a  Mixed  Factor,  Mixed  Level  Design . 76 

2.  Design  Best  Practices  within  the  Mixed  Factor,  Mixed  Level 

Process . 77 

D.  EXPLORING  FIRST  RESPONSE  TO  A  BOMB  ATTACK  USING  A 

MIXED-FACTOR,  MIXED-LEVEL  DESIGN . 77 

1.  The  Thesis  Problem . 77 

2.  Applying  the  Mixed-Factor,  Mixed-Level  Methodology  to  a 

Thesis  Question . 79 

E.  SUMMARY . 84 

VI.  THE  COMPLETE  FLEXIBLE  OLH/NOLH  METHODOLOGY: 

EXPLORING  SOLDIER-LEVEL  NETWORKS . 85 

A.  PROBLEM  STATEMENT . 85 

B.  DESIGN  OF  EXPERIMENTS  AND  SIMULATION  RUNS . 86 

C.  ANALYSIS . 88 

D.  WIDESPREAD  UTILITY  OF  NEW  NOLH  DESIGNS . 91 

VII.  CONCLUSIONS  AND  RECOMMENDATIONS  FOR  FUTURE  STUDY . 93 

BIBLIOGRAPHY . 97 

APPENDIX  A.  SATURATED  NOLH  DESIGNS . 103 

APPENDIX  B.  PAIRWISE  PLOT  OF  FACTORS  IN  A167 . Ill 


viii 


APPENDIX  C.  MAJOR  ROGINSKI’S  DESIGN . 113 

APPENDIX  D.  MAJOR  BAEZ’S  MFML  DESIGN . 123 

APPENDIX  E.  A  PREDICTIVE  MODEL  WITH  QUADRATIC 

TERMS  . 125 

INITIAL  DISTRIBUTION  LIST . 129 


IX 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


x 


LIST  OF  FIGURES 


Figure  1. 

Figure  2. 
Figure  3. 


Figure  4. 
Figure  5. 

Figure  6. 
Figure  7. 

Figure  8. 
Figure  9. 

Figure  10. 
Figure  1 1 . 

Figure  12. 

Figure  13. 

Figure  14. 

Figure  15. 

Figure  16. 


Three  permutations  of  the  vector  x  results  in  an  RLH  design  for  three 
variables  and  four  sample  design  points  using  Owens’  simplified  formula 


to  create  a  lattice  LH . 6 

In  this  RLH,  columns  two  and  three  give  |p23  =  p  =  1 . 7 


The  sample  standard  deviation  of  for  1,000  G200  n  and  k 

combinations  is  relatively  small  compared  to  its  pn™p  value.  The  largest 

standard  deviation  occurs  in  small  designs,  while  the  smallest  deviations 
are  for  larger  designs . 18 

A  logarithmic  pattern  appears  between  p"™  and  k  when  n  is  constant: 

Groups  of  n  are  presented  in  increasing  order . 21 

An  exponential  decay  or  a  negative  power  function  relationship  appears 

between  /f""’  and  n  when  k  is  constant.  For  clarity,  only  a  few  sets  for 

fixed  k  are  in  the  chart . 22 

There  is  a  nearly  linear  relationship  between  and  transfonned  n.  For 

illustration  we  show  the  graph  for  k  =  7 . 23 

A  nearly  linear  relationship  appears  between  p'™'p  and  transfonned  k.  We 

illustrate  the  case  for  n  =  257 . 23 

This  residual  plot  of  1 15  different  predicted  values  shows  curvature. 26 

This  flow  chart  presents  the  combined  process  for  matrix  transformation 

and  optimization  of  an  MIP  to  construct  an  OLH  or  NOLH  design . 60 

The  structure  of  the  stacked  MFML  design . 72 

Baltimore  harbor  vignette  terrain:  actual  photo  and  within  multiagent 

simulation  (Roginski,  2006) . 78 

Time  in  CRSP  by  convoy  number  per  network  type  when  the  rate  of 

convoy  anival  is  two  convoys  per  hour  (Baez,  2008) . 89 

Partition  tree  for  mean  time  in  CRSP,  hierarchical  network  structure 

(Baez,  2008) . 90 

Regression  metamodel  for  mean  time  in  CRSP,  hierarchical  network 

structure  (Baez,  2008) . 91 

The  plot  for  the  residuals  resulting  from  a  G200  MLR  shows  curvature 
that  indicates  a  need  to  include  more  complex  tenns  in  the  regression 

model . 126 

The  residual  plot  for  the  new  multiple  linear  regression  model  with 
quadratic  terms  still  show  curvature,  but  the  sizes  of  the  residuals  are 
much  smaller  than  from  previous  models . 127 


xi 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


LIST  OF  TABLES 


Table  1. 

Table  2. 
Table  3. 

Table  4. 

Table  5. 

Table  6. 

Table  7. 

Table  8. 

Table  9. 


A  comparison  of  recent  construction  methods  to  generate  OLH  or  NOLH 
designs  shows  that  our  new  method  significantly  increases  the  number  of 

factors,  k,  that  may  be  examined  for  the  same  number  of  runs,  n . xxii 

An  example  design  matrix  for  three  factors  (i.e.,  k  =  3)  and  eight 

experiments  (i.e.,  n  =  8) . 2 

A  comparison  of  recent  construction  methods  to  generate  OLH  or  NOLH 
designs  shows  that  our  new  method  significantly  increases  the  number  of 
factors,  k,  that  may  be  examined  for  the  same  number  of  runs,  n . 11 

This  portion  of  the  G200  table  shows  the  /?"“  for  different  design 

combinations  where  the  number  of  designs  generated  in  each  trial  is  200. 

We  enter  not  applicable  (NA)  for  design  combinations  where  n  <  k . 19 

We  show  a  few  different  design  combinations  to  compare  the  standard 
deviations  of  1,000  G1000  and  G200  RLHs.  The  greatest  difference  in 
standard  deviations  between  G1000  and  G200  is  for  the  design,  n  =  17, 
k=l,  where  GlOOO’s  standard  deviation  is  smaller  by  0.0039 . 19 

A  comparison  of  /?™”  for  1000  G1000  and  G200  shows  that  the 

differences  are  within  the  magnitude  of  the  standard  deviations  shown  in 

Table  5 . 20 

Descriptive  statistics  from  the  residuals  derived  from  the  MLR  predictions 
show  negative  skewness  and  a  relatively  small  range.  The  model  tends  to 
slightly  underestimate  the  value  of  the  minimum  maximum  absolute 
pairwise  correlation  value  of  200  RLH  designs . 25 

This  table  is  a  section  of  a  G200  p'™  table.  It  has  values  for  dimensions 

with  k  =  22  or  less . 29 

t — ~\E 

Calculations  for  I  pmap  using  the  MLR  model  as  n  changes . 30 


Table  10.  The  best  maximum  absolute  pairwise  correlation  values  from  20  FRLHs 
of  the  same  design  dimension  are  presented.  Gray-shaded  design 
dimensions  emphasize  intervals  that  this  technique  can  fill  in  the  OLH  and 
NOLH  catalogue,  using  Ang’s  (2006)  near  orthogonal  threshold. 


Diagonal-lined  areas  identify  dimensions  yet  to  meet  NOLH  criteria . 35 

Table  11.  Comparison  of  pmap  values  and  CNs  for  RLH,  F{f ,  and  Aj665  . 37 


Table  12.  These  are  the  best  maximum  absolute  pairwise  correlation  values  from  20 
FRLHs  of  the  same  design  dimension.  These  designs  are  a  continuation  of 
Table  10.  All  of  these  design  dimensions,  as  well  as  the  design 
combinations  between  their  intervals,  fill  many  of  the  gaps  in  the  OLH 
and  NOLH  catalogue . 38 


xiii 


Table  13. 

Table  14. 

Table  15. 

Table  16. 

Table  17. 

Table  18. 

Table  19. 

Table  20. 

Table  21. 

Table  22. 


We  present  how  the  parameters  of  a  design  and  the  design’s  initial  values 
translate  into  the  constraints  in  VMIN  to  optimize  the  values  of  the 
targeted  variable  column  (m  =  3,  k  =  4,  n  =  5),  where  value  levels  for 

column  3  remain  as  variables  and  all  other  values  are  fixed . 48 

This  n  =  17,  k  =  7  design,  p  =  0.066,  is  a  result  from  generating 

numerous  RLHs,  selecting  the  design  with  the  least  nonorthogonality,  and 
applying  Florian’s  method  iteratively.  It  is  the  initial  design  before  using 

VMIN . 53 

Binary  variable  values  for  factor  m  =  6  in  the  initial  design  show  the 
sparseness  of  this  single  column  optimization  problem.  A  set  of  these 

variables  corresponds  to  each  factor . 54 

An  excerpt  from  Cioppa  and  Lucas  (2007)  compares  RLHs  and  Cioppa’s 
(2002)  OLH  and  NOLH  designs  with  respect  to  their  orthogonal 

characteristics . 57 

Results  from  a  small  design  after  iterative  application  of  Florian’s  method 
on  the  left-hand  is  not  an  OLH  design.  It  can  be  used  as  the  initial  design 
for  VMIN.  The  final  design  is  orthogonal,  as  shown  on  the  right-hand 

design . 58 

The  result  from  iterative  application  of  Florian’s  method  on  a  randomly 
generated  n  =  \A,k=l  design  is  the  initial  design  for  VMIN,  as  shown  on 
the  left-hand  side  of  the  table.  It  does  not  meet  NOLH  orthogonality 
criteria.  The  final  design  for  n  =  14,  k  =  7  after  VMIN  is  a  new  NOLH 

design,  presented  on  the  right-hand  side . 61 

The  A,1)1  developed  by  RLH,  FRLH,  and  VMIN  emphasizes  the  flexibility 

of  our  methodology  to  construct  any  number  of  different  NOLH  designs 
(to  include  saturated  ones).  Note  that  none  of  the  columns  of  this  design 
are  identical  to  the  design  on  the  right-hand  side  of  Table  19.  It  shows  that 
this  new  design  is  not  merely  an  extension  of  the  n  =  14,  k  =  7  design, 

although  our  technique  can  certainly  extend  smaller  designs . 62 

A  comparison  of  different  approaches  to  develop  a  design  that  can  explore 
32  variables  shows  that  our  new  S-NOLH  design  requires  the  least  number 
of  runs,  and  is  therefore  most  efficient.  Other  approaches  can  theoretically 
develop  experimental  designs  to  explore  32  or  more  variables,  but  to  the 
author’s  knowledge,  none  are  currently  in  the  catalogue  of  OLH  and 

NOLH  designs . 64 

This  is  a  new  design  from  an  original  FRLH  for  17  runs.  We  extend  the 
number  of  factors  to  explore  until  it  is  fully  saturated.  It  allows  an 
experimenter  to  explore  16  factors  within  17  runs.  The  k:n  ratio  of  0.94  is 

large  and  makes  the  design  a  good  screening  plan . 65 

A  summary  of  techniques  from  Cioppa,  Ang,  Steinberg  and  Lin,  and  our 
new  method  shows  that  the  FRLH-MIP  method  is  a  viable  option  to 
extend  the  library  of  OLH  and  NOLH  designs.  A  label  of  “NA” 
designates  dimensions  not  currently  available  from  the  technique . 66 


xiv 


Table  23. 


Table  24. 


Table  25. 


Forty  of  the  variables  in  Major  Roginski’s  experiment  are  displayed.  The 
variables  are  all  integers,  but  some  have  enough  value  levels  to  be 

considered  as  continuous,  as  the  shaded  rows  indicate . 80 

Excerpt  of  the  DV  subset  design  for  the  MFML,  containing  24  of  the  144 
total  design  points,  shows  how  a  full,  two-level  factorial  design  for  three 
binary  variables  aligns  with  three  design  points  of  two  discrete  variables, 
each  with  three  value  levels.  We  shade  sections  of  the  table  to  emphasize 
the  match  between  a  single  design  point  for  the  discrete,  non-binary 

variables  and  a  full  two-level  factorial  for  the  binary  variables . 82 

A  saturated  design  for  n  =  16  and  k  =  15  is  the  foundation  for  creating  a 
base  design  for  an  MFML  design.  Its  correlation  measure  is  0.0471,  with 
a  condition  number  of  1 .3 19 . 87 


xv 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


xvi 


LIST  OF  SYMBOLS,  ACRONYMS,  AND/OR  ABBREVIATIONS 


CRSP 

DHS 

DoD 

DOE 

FRLH 

GAMS 

GB 

GHz 

ITV 

LCM 

LCOP 

LH 

I  WARS 

LHS 

MHPCC 

MIP 

MFML 

NOLH 

OLH 

RAM 

RLH 

nns 

SLR 

S-NOLH 

SOP 


Centralized  Receiving  and  Shipping  Point 
Department  of  Homeland  Security 
Department  of  Defense 
Design  of  Experiment 

Florian-improved  Random  Latin  Hypercube 
General  Algebraic  Modeling  System 
Gigabyte 
Gigahertz 

In-Transit  Visibility 

Least  Common  Multiple 

Logistics  Common  Operating  Picture 

Latin  Hypercube 

Infantry  Warrior  Simulation 

Latin  Hypercube  Sampling 

Maui  High  Performance  Computer  Center 

Mixed  Integer  Program 

Mixed  Factor,  Mixed  Level 

Nearly  Orthogonal  Latin  Hypercube 

Orthogonal  Latin  Hypercube 

Random  Access  Memory 

Random  Latin  Hypercube 

Root  Mean  Square 

Simple  Linear  Regression 

Saturated  Nearly  Orthogonal  Latin  Hypercube 

Standard  Operating  Procedure 


xvii 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


EXECUTIVE  SUMMARY 


Experiments  using  computer  models  are  of  great  importance  to  scientific  research, 
national  defense,  and  public  policy  debates.  This  follows  from  continual  improvements 
in  computational  power  against  a  backdrop  of  rising  costs  and  other  challenges  frequently 
associated  with  physical  experimentation.  Moreover,  often  it  is  not  practical  to  conduct 
many,  or  even  any,  physical  experiments — e.g.,  the  long-term  effects  of  various  policies 
on  global  climate,  emergency  response  to  large-scale  nuclear  accidents,  potential  major 
military  conflicts,  etc.  In  situations  with  a  dearth  of  real-world  experimental  data, 
computer  models  are  frequently  used  to  help  understand  these  complex  issues. 

Computer  experimentation  in  the  above  areas  may  contain  thousands  of  input 
variables  and/or  take  a  long  time  (e.g.,  many  days)  to  run  (Kleijnen  et  ah,  2005). 
Researchers  have  several  techniques  to  effectively  extract  information  from  these  models. 
Among  them  are  designs  of  experiments  that  are  specifically  developed  for  efficiently 
exploring  high-dimensional  computer  models. 

Latin  hypercube  (LH)  sampling  (McKay  et  ah,  1979)  has  proven  to  be  an 
invaluable  design  technique.  In  fact,  LHs  are  reported  to  be  the  predominant  design  for 
experiments  involving  computer  simulations  (Buyske  &  Trout,  2001).  A  key  reason  for 
this  is  that  they  come  with  minimal  restrictions  on  the  number  of  factors  (i.e.,  input 
variables)  and  sampling  budget.  In  addition,  the  resultant  output  data  allow  us  to  fit 
many  different  models  to  multiple  outputs  from  a  single  experimental  set.  These  designs 
permit  us  to  simultaneously  screen  many  factors  for  significance  and  to  fit  very  complex 
meta-models  (including  nonparametric)  to  a  handful  of  dominant  variables.  This 
flexibility  extends  to  visual  investigations  of  the  data  (Sanchez  &  Lucas,  2002),  as  we  get 
many  viewpoints  from  which  to  observe  the  relationships  between  inputs  and  outputs. 

There  are  a  large  number  of  LHs  for  any  given  experimental  condition,  by  which 
we  mean  the  number  of  design  points  (n)  and  variables  (k).  A  design  point  is  a  unique 
combination  of  the  values  of  the  input  variables.  Some  LH  designs  possess  better 
properties  than  others.  Lor  example,  LHs  can  have  unacceptable  correlations  among  the 
input  factors,  thus  hindering  many  statistical  procedures  that  we  might  wish  to  apply  in 
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analyzing  the  relationships  between  inputs  and  outputs.  To  mitigate  this  problem,  various 
researchers  (e.g.,  Owen,  1994;  Ye,  1998;  Cioppa,  2002;  Ang,  2006;  and  Steinberg  &  Lin, 
2006)  have  developed  algorithms  to  eliminate  or  significantly  reduce  correlations  among 
input  variables — thereby  developing  orthogonal  and  nearly  orthogonal  Latin  hypercubes 
(OLH,  NOLH).  A  design  is  nearly  orthogonal  if  the  maximum  absolute  pairwise 
correlation  ( pmap )  among  the  columns  of  the  design  matrix  is  <  .05. 

The  success  of  these  recent  efforts  to  create  OLH  and  NOLH  designs  is  extensive, 
but  all  of  these  approaches  are  subject  to  stringent  constraints  in  their  dimensionality, 
usually  requiring  that  n  be  a  power  of  two  or  a  power  of  two  plus  one.  Steinberg  and  Lin 
(2006)  state,  “[t]he  primary  limitation  to  our  method  is  the  severe  sample  size  constraint.” 
For  instance,  to  explore  12  factors  Steinberg  and  Lin  require  only  16  runs.  To  explore  13 
factors,  they  require  a  design  with  64  runs. 

This  dissertation  details  an  algorithm  for  generating  NOLHs  for  a  greatly 
expanded  set  of  n  and  k,  including  situations  where  k=  n—  1,  i.e.,  fully  saturated  designs, 
as  well  as  methodologies  that  expand  on  the  usage  of  LHs  to  include  accommodating 
discrete  variables  into  nearly  orthogonal  designs.  We  further  specify  the  conditions 
under  which  these  approaches  are  appropriate.  Our  research  goals  are  listed  below. 

•  Provide  analysts  the  ability  to  apply  uncorrected  LH  designs  to  a  number 
of  situations  in  which  scientists  do  not  have  the  advantage  of  developing 
OLH  or  NOLH  designs,  or  may  not  wish  to  use  OLH  or  NOLH  designs. 

•  Provide  analysts  the  ability  to  generate  OLH  and  NOLH  designs  for  many 
more  combinations  of  n  and  k  than  previous  approaches  allowed — with  a 
particular  emphasis  on  maximizing  k  for  any  given  n. 

•  Provide  analysts  the  ability  to  incorporate  different  factor  types  (discrete 
and  continuous)  with  different  value  levels  (mixed  factor,  mixed  level) 
into  a  nearly  orthogonal  design  matrix. 

•  Provide  analysts  the  ability  to  quickly  generate  new,  robust,  special 
purpose  designs  to  meet  new  and  changing  design  requirements. 

To  meet  these  goals,  we  quantify  when  random  Latin  hypercubes  (RLHs)  are  acceptable 
and  develop  a  number  of  progressive  techniques  to  construct  and  use  NOLHs. 

Tools  and  methods,  based  on  the  maximum  absolute  pairwise  correlation,  help 
select  an  appropriate  dimension  for  the  experimental  design  and  obtain  an  uncorrected 
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RLH  with  a  suitable  p  .  Astute  selection  of  a  design  dimension  (n  =  sample  runs, 

k  =  factors)  will  often  allow  an  experimenter  to  quickly  generate  a  useful  design.  We 
present  a  procedure  that  systematically  directs  the  selection  of  an  appropriate  design 
dimension  and  provides  guidance  for  accepting  an  uncorrected  RLH,  as  RLH  designs  are 
commonly  used  by  practitioners.  This  includes  a  parsimonious  function  that  relates  the 
expected  maximum  absolute  pairwise  correlation  to  transformed  values  of  n  and  k.  This 
method  requires  no  specialized  algorithms  or  software  packages,  making  it  applicable  in 
most  situations  and  an  attractive  option  for  experimenters  with  limited  resources. 

We  expand  the  application  of  Florian’s  method  (1992)  to  produce  new  OLH  and 
NOLH  designs.  Our  studies  reveal  that  iterative  application  of  Florian’s  method  can  be 
extremely  powerful.  We  demonstrate  that  the  number  of  candidate  designs  needed  is 

n 

often  much  less  than  1,000  for  even  large  experiments.  When  n  >  50  and  k<  — , 

Florian’s  method  is  usually  sufficient  to  transform  one  randomly  generated  LH  to  meet 
our  NOLH  criteria.  Even  if  we  break  these  loose  constraints,  we  often  construct  NOLHs. 

We  use  optimization  techniques  to  construct  OLH  and  NOLH  designs  for 
dimensions  beyond  what  Florian’s  method  allows.  Optimization  presents  difficulties  for 
solving  the  experimental  design  problem  because  our  objective  function  is  nonlinear  and 
we  have  integer  constraints.  As  n  and  k  increase,  the  dimensionality  of  LHs  adds  to  the 
challenges.  Therefore,  we  develop  a  construction  methodology  based  on  a  focused 
optimization  routine  that  greatly  expands  the  set  of  n  and  k  for  which  orthogonal  and 
nearly  orthogonal  designs  are  available.  Specifically,  we  combine  RLH  generation, 
Florian’s  correlation  reduction  method,  and  optimization  of  a  mixed  integer  problem — 
using  a  heuristic  that  focuses  on  one  column  at  a  time — to  minimize  p  for  a  specified 

n  and  k.  By  minimizing  the  worst-case  correlation,  we  control  all  other  pairwise 
correlations.  To  best  illustrate  the  power  of  this  new  methodology,  we  concentrate  on 
design  dimensions  that  are  absent  from  existing  catalogues  and  ones  that  are  the  most 
difficult  to  generate  with  previous  methods;  that  is,  experiments  with  n  <50  as  k 
approaches  n. 
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Our  results  show  significant  improvements  over  the  efficiency  of  past 
experimental  designs  for  the  same  number  of  runs.  Table  1  compares  our  new  technique 
with  other  recent  methods  to  create  orthogonal  or  nearly  orthogonal  Latin  hypercube 
designs.  For  a  given  n,  our  methods  produce  designs  that  can  explore  many  more  factors. 

Table  1.  A  comparison  of  recent  construction  methods  to  generate  OLH  or  NOLH 
designs  shows  that  our  new  method  significantly  increases  the  number  of  factors, 
k,  that  may  be  examined  for  the  same  number  of  runs,  n. 


« 

Maximum  k  for  Each  Method 

Ye 

Cioppa 

Ang 

Steinberg 
and  Lin* 

New 

17 

6 

7 

8 

12 

16 

33 

8 

11 

16 

NA 

32 

65 

10 

16 

32 

56 

63** 

*  Steinl 

berg  and  Lin’s  method  uses  n  -  1  runs. 

**  New  design  uses  64  runs. 

In  addition  to  increasing  the  number  of  NOLHs,  we  exploit  the  flexibility  of  our 
new  designs  to  develop  an  approach  that  handles  mixed-factor,  mixed-level  (MFML) 
experiments.  In  accordance  with  LH  sampling  construct  (McKay  et  al.,  1979),  OLH  and 
NOLH  designs  assume  that  all  variables  are  continuous.  In  practice,  this  assumption  is 
frequently  false.  Many  experiments  include  a  number  of  discrete  variables  (that  do  not 
necessarily  have  the  same  number  of  value  levels),  along  with  continuous  factors.  We 
designate  these  as  MFML  experiments. 

MFML  experiments  can  diminish  the  advantages  of  OLH  and  NOLH  designs. 
Converting  (by  rounding)  the  actual  values  of  these  discrete  variables  onto  a  raw  OLH  or 
NOLH  design,  especially  when  there  are  a  small  number  of  runs,  often  results  in  an 
overall  design  with  high  correlations  among  input  variables.  To  remedy  the  problems 
that  MFML  experiments  present  scientists,  we  exploit  the  dimensional  flexibility  of  our 
new  designs  by  combining  them  with  stacking  methods.  We  also  combine  our  method 
with  proven  design  techniques.  The  resulting  MFML  designs  retain  much  of  the 
orthogonality  properties  of  the  basic  NOLH,  thereby  maintaining  their  utility. 

The  principal  shortfall  of  previous  methods  to  develop  OLH  and  NOLH  designs  is 
their  strict  limitations  in  dimensionality.  Our  new  techniques  and  procedures  overcome 
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many  of  these  restrictions,  thereby  enabling  scientists  and  analysts  to  apply  these 
powerful  designs  in  more  situations. 
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I.  INTRODUCTION 


Latin  hypercubes  (LHs)  have  proven  to  be  a  powerful  tool  for  conducting 
high-dimensional  computational  experiments  in  many  of  the  sciences  and  national 
security  studies.  Our  research  expands  the  set  of  orthogonal  Latin  hypercube  (OLH)  and 
nearly  orthogonal  Latin  hypercube  (NOLH)  designs  available  to  those  involved  in 
experimentation.  We  accomplish  this  by  greatly  increasing  the  feasible  combinations  of 
design  points1  (n)  and  factors2  (k)  that  are  available  for  experimentation — which 
heretofore  have  been  highly  constrained.  Furthermore,  we  offer  methods  to  create  nearly 
orthogonal  designs  when  some  of  the  input  variables  are  discrete — perhaps  with  a  large 
number  of  possible  values.  Moreover,  our  work  provides  a  descriptive  study  of  the 
correlation  structures  inherent  in  random  and  specially  constructed  LHs.  The  culmination 
of  this  dissertation  provides  the  first  high-dimensional,  fully  saturated  NOLHs. 

A.  CHALLENGES  IN  EXPERIMENTAL  DESIGNS  FOR  SIMULATIONS 

Experimentation  is  fundamental  to  knowledge  acquisition.  Unfortunately, 
physical  experimentation  is  often  infeasible  due  to  safety,  money,  time,  or  resource 
constraints.  Consequently,  experimenters  often  turn  to  computational  experimentation, 
such  as  computer  simulation,  of  the  system  of  interest.  The  ability  of  computers  to 
simulate  increasingly  complex  problems  provides  analysts  greater  potential  to  assist 
decision  makers,  such  as  those  in  the  Department  of  Defense  (DoD).  Network-centric 
warfare  and  irregular  warfare  are  but  two  of  the  areas  that  involve  vast  numbers  of 
quantitative  and  qualitative  variables.  Often  these  simulations  contain  hundreds,  or  even 
many  thousands,  of  input  variables  (Saeger  &  Hinch,  2001).  The  commercial  world  and 
natural  scientists  face  similar  dilemmas.  Studies  in  human  behavior  and  biomimetics 
often  use  computer  simulations  and  rely  on  efficient  designs  of  experiments  (DOEs), 


1  A  design  point  is  a  unique  combination  of  the  values  of  the  factors.  We  use  the  terms  design  point 
and  run  interchangeably. 

2  Factors  are  input  variables  whose  settings  or  values  can  be  controlled  by  the  experimenter.  We  use 
the  terms  factor  and  variable  interchangeably. 
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adopting  any  technique  that  allows  analysts  to  “systematically  examine  a  broader  range 
of  possible  innovations  .  .  (Booker,  2005). 

Researchers  implement  a  number  of  methodologies  to  extract  the  most  useful 
information  from  simulations.  Among  these  methods  are  experimental  designs 
specifically  developed  for  efficiently  exploring  computer  models  (Kleijnen  et  al.,  2005). 
The  design  specifies  the  inputs  for  the  experiments.  In  particular,  given  that  n 
experiments  are  to  be  conducted  over  k  input  variables,  the  DOE  is  usually  specified  by 
an  n  x  k  design  matrix  X.  Each  column  of  X  represents  an  input  variable  and  each  row 
specifies  the  input  variable  values  for  a  single  design  points  (see  Table  2). 

Table  2.  An  example  design  matrix  for  three  factors  (i.e.,  k  =  3)  and 

eight  experiments  (i.e.,  n  =  8). 


Design 

Point 

Factor i 

Factor2 

Factors 

1 

1 

0.5 

0 

2 

1 

0.5 

10 

3 

1 

1.5 

0 

4 

1 

1.5 

10 

5 

0 

0.5 

0 

6 

0 

0.5 

10 

7 

0 

1.5 

0 

8 

0 

1.5 

10 

The  information  that  is  obtainable  by  analyzing  the  data  after  conducting  the 
experiments  depends  critically  on  the  design.  For  example,  if  a  quantitative  input 
variable  only  has  two  distinct  values,  then  the  expected  response  cannot  be  modeled  as  a 
quadratic  in  that  variable.  If  we  know  in  advance  what  metamodels  we  want  to  fit  to 
describe  the  response  surface  of  the  simulation  model  (Law  &  Kelton,  2003),  and  the 
error  structure  of  the  experiments,  then  an  optimal  design  may  exist  (Fedorov,  1972). 
However,  in  many  cases,  especially  when  conducting  exploratory  analysis,  we  desire 
designs  that  “allow  one  to  fit  a  variety  of  models”  (Santner  et  al.,  2003).  For  such 
situations,  LH  sampling  (McKay  et  al.,  1979)  has  proven  to  be  an  invaluable  technique 
for  designing  high-dimensional  computer  experiments.  For  the  past  15  years  it  has 
become  an  “important  part  of  uncertainty  analyses”  (Wyss  &  Jorgensen,  1998).  Under 
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general  conditions,  LH  designs  perfonn  exceptionally  well  in  comparison  to  other 
popular  experimental  design  options  (Johnson,  2008).  In  fact,  LHs  are  purported  to  be 
the  predominant  DOEs  involving  computer  simulations  (Buyske  &  Trout,  2001). 

LHs  are  popular  designs  for  simulation  exploration  because  of  their  design  and 
analytical  flexibility  (Lucas  &  Sanchez,  2006a).  An  LH  design  can  easily  be  constructed 
for  any  number  of  continuous  input  variables  (k)  and  sampling  budget  (n)  as  long  as 
k  <  n.  Indeed,  many  simulation  software  packages,  even  spreadsheet  simulation  add-ons, 
can  generate  LHs  (Sugiyama  &  Chow,  1997).  With  sufficiently  large  n,  the  output  data 
can  be  fit  to  a  wide  variety  of  metamodels  (from  simple  to  complex)  for  many  different 
responses.  In  particular,  LHs  enable  us  to  simultaneously  screen  many  factors  for 
significance  and  fit  very  complex  metamodels  to  a  modest  number  of  critical  variables. 
Moreover,  LHs  have  good  space-filling  properties,  i.e.,  they  are  good  at  providing 
“information  about  all  portions  of  the  experimental  region”  (Santner  et  al.,  2003). 

The  existence  of  desirable  properties  in  LH  designs,  which  correspond  with  their 
degree  of  utility,  is  dependent  on  the  absence  of  correlation  among  the  columns  of  the 
design  matrix.  Many  analytical  techniques  that  experimenters  apply  to  computer 
outputs — such  as  regression  modeling  and  partition  trees — suffer  when  there  is 
multicollinearity  among  the  input  variables  (Montgomery  et  al.,  2001  and  Kim  &  Loh, 
2003).  Consequently,  analysts  usually  desire  a  design  matrix  with  a  correlation  structure 
that  is  close  to  that  of  the  identity  matrix  (Iman  &  Conover,  1982,  and  Iman  & 
Davenport,  1982).  Unfortunately,  generating  LH  designs  involves  an  inherent 
randomness  in  the  construction  of  the  columns — a  random  permutation  of  the  value 
levels,  1  through  n.  This  construction  method  invites  the  possibility  of  substantial 
multicollinearity  among  the  columns  of  the  design  matrix,  especially  when  n  is  not 
substantially  larger  than  k. 

To  mitigate  the  difficulties  that  can  be  caused  by  correlations  among  the  columns 
in  the  design  matrix,  a  succession  of  methods  have  been  developed  that  reduce  or  even 
eliminate  the  correlations.  An  LH  is  called  an  orthogonal  Latin  hypercube  (OLH)  if  the 
correlation  coefficient  between  all  pairs  of  columns  in  the  design  matrix  is  zero.  We  will 
denote  an  OLH  with  k  variables  in  n  runs  as  Onk  .  In  some  situations,  an  OLH  may  not  be 
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available  or  it  may  come  with  poor  space-filling  properties  (Cioppa  &  Lucas,  2007).  In 
such  a  situation,  a  nearly  orthogonal  Latin  hypercube  (NOLH)  may  exist  and  be 
preferred.  We  will  define  NOLHs  precisely  later  in  this  section.  An  NOLH  with  k 
variables  in  n  runs  is  denoted  as  N'l .  The  absence  or  near  absence  of  correlations  among 

their  design  columns  makes  OLH  and  NOLH  designs  attractive.  However,  there  are  only 
a  small  number  of  these  catalogued  designs  available  and  most  applications  do  not  use  an 
OLH  or  NOLH  design. 

A  variety  of  reasons  exist  for  why  OLH  and  NOLH  designs  are  not  used  more 
often  than  they  currently  are.  Efforts  to  reduce  or  eliminate  correlation  are  often 
computationally  expensive  and  time  consuming  (Cioppa,  2002).  The  complexity  of 
algorithms  involved  in  creating  them  typically  requires  specialized  software  and 
programming  (Iman  &  Conover,  1982).  Most  detrimental  for  increasing  the  use  of  OLH 
and  NOLH  designs  are  the  severe  restrictions  on  the  dimensionality  of  the  design  matrix 
(and  hence,  experiment).  In  every  case,  the  dimensions  of  new  designs  restrict  n  to  be  a 
function  of  a  power  of  2,  or  a  power  of  2  plus  1 .  Furthermore,  methods  for  creating  LH 
designs  assume  continuous  variables,  which  necessitate  rounding  off  the  raw  design 
values  to  accommodate  discrete  variables.  The  resulting  design  values  do  not  typically 
retain  the  orthogonality  (or  near  orthogonality)  properties  of  the  raw  design. 

This  dissertation  overcomes  the  above  obstacles  and  greatly  expands  the  set  of 
readily  available  OLH  and  NOLH  designs.  It  presents  new  construction  methods,  and 
augments  previous  techniques,  to  generate  design  matrices  (A)  with  little  or  no 
correlation.  While  LH  designs  assume  continuous  variables,  our  work  also  encompasses 
discrete  variables  that  may  have  unequal  numbers  of  levels.  The  resulting  design 
matrices  fill  substantial  gaps  in  the  current  library  of  OLH  and  NOLH  designs. 

B.  RANDOM  LATIN  HYPERCUBES  AND  INHERENT  CORRELATION 

The  foundation  of  our  work  resides  in  the  characteristics  of  the  random  Latin 
hypercube  (RLH).  We  briefly  describe  the  nature  of  the  RLH  as  a  reference  for  defining 
the  terminology  we  use  in  this  dissertation  and  setting  the  context  for  our  enhancements. 
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1. 


Random  Latin  Hypercube  Generation 


RLH  generation  is  so  named  to  emphasize  the  randomness  in  its  construction. 
Generating  an  RLH  is  relatively  simple.  Consider  an  n  by  k  matrix,  X.  Owen  (1994) 
describes  the  ith  element  in  the  jth  column,  Xj ,  of  an  LH  as 

f  U,(  0-u„)) 

X.  =F/\  - -  ,  for/  =  !,...,«  and  j  =  \,...k,  where  zr  -  (1), ...,  zr .(«)  is  one  of  the 


V  J 

n\  possible  random  permutations  of  1,...,  n  in  which  all  n\  permutations  are  equally 
likely.  The  F \,j  =  1,  ...k  are  continuous  and  invertible  distribution  functions.  Finally, 


the  Utj,i=  1 n,j  =  1,  ...k  are  independent  and  identically  distributed  unifonn  [0,  1] 
random  variables. 

Similar  to  Owen  (1994),  Ye  (1998),  Cioppa  (2002),  and  Steinberg  and  Lin  (2006), 
we  use  a  lattice  version  of  the  LH  (Patterson,  1954)  and  assume  that  F.  is  the  distribution 

function  of  a  uniform  [0,  1]  for  i  =  1 n,j  =  1  ,...k.  We  further  replace  the  uniform 
random  variable,  U tj ,  with  0.5,  the  median  of  a  uniform  [0,  1]  distribution;  thus  creating  a 

center  point  in  each  stratum.  Therefore,  generating  an  RLH  reduces  to  independently 
generating  k  columns,  each  a  permutation  of  the  integers  i  =  1,...,  n,  where  the  n\ 
permutations  are  equally  likely.  For  a  given  column,  j  =  1,  2,...,  k,  this  yields: 

(;r  (/)  -  .5) 

X/  =- - -,  for  i  =  .  The  columns  of  the  resulting  design  matrix,  X1 ,  for 

n 


j  =  1,  ,...,  k,  are  k  independent  permutations  of  the  vector  x,  which  contains  elements 


derived  from  the  following  simplified  formula  (Owen,  1994):  x.  = 


5) 


,  for  i  =  1,...,  n  . 


We  illustrate  this  method  by  examining  three  factors  ( k  =  3)  with  four  sample  runs 
(n  =  4).  The  experimenter  calculates  the  first  element,  /  =  1,  from  the  simplified  formula 

- — ^ =  ~~  =  ~  ■  Computation  of  the  remaining  elements,  for  /  =  2,  3,  and  4, 


results  in  the  vector  x  = 


13  5  7 
8  ’  8  ’  8  ’  8 


.  A  random  permutation  of  x  for  each  column 


produces  a  lattice  LH  for  three  factors  and  four  design  points,  as  shown  in  Figure  1. 
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Figure  1.  Three  permutations  of  the  vector  x  results  in  an  RLH  design  for  three 
variables  and  four  sample  design  points  using  Owens’  simplified  fonnula  to 
create  a  lattice  LH. 

The  columns  in  X  are  often  translated  and  scaled  for  a  particular  application.  For 
a  given  column,  the  values  are  equally  spaced  between  the  minimum  and  maximum 
values  allowable  for  the  corresponding  variable.  Given  this  lattice  construction,  a  total  of 
(n\)  permutations  are  possible  for  each  column.  This  does  not  create  (n!)k  unique  LHs 
because  some  designs  can  be  mapped  to  others  by  permuting  rows  and  columns. 
However,  the  total  number  of  unique  designs  is  astronomically  large.  For  example,  there 
are  over  87  billion  unique  designs  with  k=2  and  n=  14. 

The  simplicity  and  flexibility  of  this  sampling  scheme  has  a  price.  It  is  precisely 
the  randomness  of  column  generation  that  opens  the  possibility  for  strong 
multicollinearity  among  the  columns  of  the  design  matrix. 


Measure  of  Nonorthogonality  for  Design  Selection 


Along  with  Tang  (1998),  we  recognize  that  selecting  a  design  based  on  the 
correlations  between  the  columns  of  the  design  matrix  is  a  reasonable  way  to  obtain  a 
design  with  acceptable  nonorthogonality.  Cioppa  (2002)  specifically  uses  the  maximum 
absolute  pairwise  correlation  between  columns  of  an  LH  to  obtain  OLH  and 
NOLH  designs. 

Following  Cioppa,  we  designate  the  maximum  absolute  pairwise  correlation 

(k\ 

( pma  )  as  the  key  measure  for  discriminating  between  designs.  There  are  pairwise 
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correlations  in  a  design  with  k  variables.  Computation  of  the  correlation  coefficient 

6 


between  any  two  vector  columns  in  design  matrix  X  is  straightforward.  Using  Pearson’s 
equation,  the  correlation  between  X'  and  Xj  is  given  by 

XlK -*')(**  ~xJ)\ 

P  —  —  , 

V  b= 1  b= 1 

where  x‘  and  x  J  are  the  mean  values  of  the  7th  and /h  columns  and  x‘h  is  the  bth  value  of 
the  7th  column. 

The  largest  absolute  correlation  among  the  columns  gives  the  most  extreme 
pairwise  correlation.  We  use  this  value  to  define  the  degree  of  nonorthogonality  of  a 
design.  By  minimizing  the  worst-case  pairwise  correlation,  we  control  all  other  pairwise 
correlations.  Hence,  we  focus  attention  on  p  throughout  our  study  and  creation  of 

new  OLH  and  NOLH  designs.  Specifically, 

f?{  kl  }■ 

Our  collection  of  techniques  minimizes  pmap.  We  further  use  p  as  a  guide  for 

combining  reliable  design  approaches  with  our  new  designs,  thereby  expanding  the  utility 
of  OLH  and  NOLH. 

The  role  of  p  in  our  study  is  evident  in  the  large  values  in  which  it  appears  in 

RLHs,  especially  when  k  is  not  too  much  smaller  than  n.  Consider  the  design  matrix  in 
Figure  2.  Three  new  pennutations  of  vector  x  result  in  this  design  matrix,  where  columns 
2  and  3  have  a  correlation  value  of-1,  i.e.,  p  =  1 . 

' 3  ]_  7" 
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Figure  2.  In  this  RLH,  columns  two  and  three  give  p23  =  pmap  =  1 . 
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In  a  sampling  method  in  which  all  possible  RLH  designs  are  equally  probable 
(Satterthwaite,  1959),  the  probability  that  a  highly  correlated  design  occurs  is  high.  To 
illustrate,  we  randomly  generate  1 ,000  4  x  3  RLH  designs  and  measure  the  p  of  each. 

Over  77%  of  the  designs  have  a  p  >  0.80 ,  with  nearly  25%  having  p  =  1 .  Clearly, 

the  possibility  of  incurring  high  correlations  in  an  RLH  can  be  problematic. 

Whereas  randomness  creates  a  concern  of  possible  high  correlations  among  the 
columns  in  the  design  matrix,  it  may  also  result  in  RLH  designs  that  have  very  small 
correlations.  This  observation  offers  an  opportunity  to  develop  a  systematic  process  for 
selecting  suitable  RLH  designs  and  applying  a  variety  of  transformation  and  generation 
techniques  to  minimize  their  nonorthogonality.  In  doing  so,  possibilities  to  reduce 
restrictions  in  dimensionality  also  emerge. 

C.  A  PROGRESSION  OF  OLH  AND  NOLH  CONSTRUCTION  METHODS 

Techniques  to  create  new  OLH  and  NOLH  designs  have  produced  incremental 
increases  in  feasible  n  and  k  combinations.  A  succession  of  studies  extends  the 
foundational  paper  by  McKay  et  al.  (1979)  to  produce  methods  that  can  reduce  the 
correlations  of  a  given  LH  design.  Iman  and  Conover  (1982)  induce  rank  correlation  in 
the  design  to  correspond  with  the  correlation  that  one  would  expect  from  the  input 
variables,  resulting  in  a  means  to  control  the  correlation  of  the  design  matrix.  Similar  to 
Iman  and  Conover  (1982),  Florian  (1992)  updates  LH  sampling  designs  through  matrix 
transformation.  Owen  (1994)  introduces  a  process  based  on  Gram-Schmidt 
orthogonalization  to  control  correlations  among  variables.  These  methods  attempt  to 
convert  LH  designs  to  correspond  with  correlation  matrices  that  approach  the  identity 
matrix.  They  do  not  guarantee  the  complete  elimination  of  correlation  from  the  design, 
although  uncorrelated  designs  are  achieved  in  some  cases. 

A  relatively  recent  wave  of  techniques  completely  eliminates  correlation  among 
the  columns  of  an  LH  design  during  its  construction.  Ye  (1998)  introduces  a  class  of 
LHs  with  no  correlation  among  its  columns  and  designated  them  OLHs.  Ye  finds  that  it 
is  possible  to  develop  OLH  designs  when  the  number  of  runs,  n,  is  a  power  of  2  plus  1. 
Through  Ye’s  method,  experimenters  can  generate  an  orthogonal  design  ( p  =0  and 
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condition  number  =  l)3  with  k  =  2m  -  2  for  an  experiment  with  n  =  2'"  +  1  runs,  where  m 
is  a  positive  integer.  Each  column  in  the  design  uses  a  permutation  matrix  that  is  based 
on  the  Kronecker  product  (Graham,  1981)  of  identity  matrices  and  a  transposition  matrix. 
Ye  uses  all  m  of  his  main  permutation  matrices  and  m  -  2  permutation  matrices  from  the 


possible  pairwise  multiplications  of  the  main  permutation  matrices.  Ye  (1998) 


understands  that  the  sizes  of  his  OLH  designs  are  somewhat  inflexible,  but  asserts  that 
computer  power  mitigates  this  disadvantage. 

Cioppa  (2002)  extends  Ye’s  method  for  constructing  OLH  designs  by  increasing 
the  number  of  permutation  matrices  that  generate  mutually  orthogonal  columns.  Instead 

f  m  - 1 ) 

of  using  m  -  2  of  the  pairwise  combinations,  Cioppa  uses  all  pairwise  matrix 


multiplication  combinations.  In  so  doing,  Cioppa  increases  the  number  of  columns  ( k )  in 

f  m  -1) 

the  design  matrix  from  2m  -  2  to  m  +  ,  while  maintaining  their  full  orthogonality 


for  the  same  number  of  runs. 

Cioppa,  however,  notices  that  this  approach  can  yield  an  OLH  that  does  not  have 
good  space-fdling  properties.  Hence,  Cioppa  develops  a  computationally  intensive 
algorithm  that  improves  on  the  space-filling  properties  of  Ye’s  designs  by  accepting  a 
small  amount  of  nonorthogonality.  Cioppa  selectively  catalogues  designs  with 
space-filling  qualities  that  are  better  than  Ye’s  designs,  in  tenns  of  the  modified  L2 
discrepancy  (ML?)  and  Euclidean  maximin  (Mm)  measures.  The  amount  of 
nonorthogonality  that  Cioppa  accepts  in  terms  of  pmap  is  at  most  0.03  and  X  must  also 

have  a  condition  number  less  than  1.13  (when  the  columns  of  X  are  scaled  to  span  the 
interval  [-1,1]).  We  recall  that  an  OLH  has  pmap  equal  to  0  and  condition  number  equal 

to  1.  The  slight  nonorthogonality  of  the  design  earns  its  name  as  an  NOLH.  Cioppa 
demonstrates  that  his  method  works  for  up  to  k  =  67  and  conjectures  that  for  any  positive 


3  The  condition  number  is  the  ratio  of  the  largest  to  smallest  eigenvalues  of  X’X,  where  X  is  the 
11  x  k  design  matrix  (Cioppa  &  Lucas,  2007).  The  condition  number  represents  the  design  matrix’s 
sensitivity  to  small  changes  in  its  values  (Leon,  2002). 
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integer  value  of  m,  an  NOLH  design  can  be  constructed  whenever  n  =  2"!  +  1  and 


m  + 


m  - 1 
v  2  , 


Existing  space-filling  NOLH  designs  from  Cioppa’s  method  are 


catalogued  for  up  to  29  factors  within  257  runs. 

Ang  (2006)  augments  Cioppa’s  methodologies  to  create  larger  OLH  and  NOLH 


dimensions.  While  Cioppa  uses  all 


in  - 1 


pairwise  matrix  multiplication  combinations, 


Ang  uses  all  three-way,  and  larger,  product  combinations  to  create  new  permutation 
matrices.  Each  new  pennutation  matrix  generates  a  new  column  in  the  OLH  design.  The 


number  of  factors  that  Ang’s  designs  can  explore  is  k  =  1  +  ^ 

j= i 


m  - 1 


where  p  (with  the 


v  J  ) 

maximum p  =  m  - 1 )  is  the  p- way  combinations  used  for  developing  the  permutation 
matrices  and  m  is  a  positive  integer.  The  number  of  runs  required  to  explore  k  factors  is 
still  n  =  2m  +  1.  Lor  instance,  a  design  with  m  =  5  can  explore  k  =  16  factors  when  using 
the  maximum  (p  =  4)  combinations.  Such  a  design  requires  n  =  25  +  1  =  33  runs.  This 
technique  results  in  designs  that  can  explore  more  factors  with  the  same  number  of  runs. 
These  designs  begin  to  approach  a  certain  saturation  level,  with  a  noted  decrease  in  their 
space-filling  properties  as  p  increases.  Additionally,  Ang  (2006)  redefines  the  NOLH 
criteria  as  p  <0.05  and  a  condition  number  of  no  more  than  1.20.  We  use  these 


thresholds  for  our  new  methodologies,  with  a  particular  emphasis  on  minimizing  p  . 

Steinberg  and  Lin  (2006)  rotate  two-level  factorial  designs  to  construct  OLHs  for 
n  =  2h,  with  h  a  power  of  2,  and  the  maximum  number  of  factors  being  Bi/h,  where 

,  with  |_cj  =  the  maximum  integer  less  than  or  equal  to  c.  Lor  instance,  for 


Bi, 


n  - 1 


n  =  16  runs,  h  =  4  and  B\x  =  3,  so  a  0\l  is  possible.  We  note  that  these  OLHs  do  not 

include  a  center  point,  as  do  previously  discussed  OLHs. 

The  success  that  these  recent  efforts  have  had  in  creating  OLH  and  NOLH 
designs  is  extensive,  but  OLH  and  NOLH  designs  are  still  subject  to  stringent  constraints 
in  their  dimensionality.  As  Steinberg  and  Lin  (2006)  state,  “[t]he  primary  limitation  to 
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our  method  is  the  severe  sample  size  constraint.”  We  see  that  to  explore  12  factors  with 
their  algorithm  requires  only  16  runs.  To  explore  13  factors,  Steinberg  and  Lin  require  a 
design  with  64  runs.  In  comparison,  Ang  (2006)  needs  33  runs  to  explore  up  to  16 
factors.  However,  to  explore  17  factors,  Ang  must  increase  the  number  of  runs  to  65. 
Table  3  compares  our  new  method  with  recent  methods  to  create  orthogonal  or  nearly 
orthogonal  designs. 


Table  3.  A  comparison  of  recent  construction  methods  to  generate  OLH  or  NOLH 

designs  shows  that  our  new  method  significantly  increases  the  number  of  factors, 
k,  that  may  be  examined  for  the  same  number  of  runs,  n. 


« 

Maximum  k  for  Each  Method 

Ye 

Cioppa 

Ang 

Steinberg 
and  Lin* 

New 

17 

6 

7 

8 

12 

16 

33 

8 

11 

16 

None 

32 

65 

10 

16 

32 

56 

63** 

*  Steinberg  and  Lin’s  method  uses  n  -  1  runs. 
**  New  design  uses  64  runs. 


All  existing  OLH  techniques  have  a  common  inflexibility  in  dimensionality.  Our 
new  approach  constructs  flexible  NOLH  designs — that  is,  sample  sizes  need  not  be 
related  to  a  power  of  2  (any  n  is  allowable)  and  many  feasible  k  (often  all  k—  1,  . .  .n  -  1) 
are  possible.  Indeed,  in  many  cases,  we  create  fully  saturated  designs  that  are  orthogonal 
or  nearly  orthogonal.  Furthermore,  the  flexibility  of  our  OLH  and  NOLH  designs  make  it 
possible  to  combine  them  with  other  methods  to  create  experimental  designs  that  include 
discrete  variables. 

D.  DISSERTATION  ORGANIZATION 

The  flow  of  this  dissertation  is  a  progression  of  techniques  and  procedures  that  we 
build  to  meet  our  research  goals.  The  resulting  tools  and  products  greatly  expand  our 
ability  to  explore  broad  regions  of  a  complex  simulation  model  containing  a 
high-dimensional  input  space,  characterized  by  a  response  surface  that  may  be  highly 
nonlinear.  We  introduce  several  methodologies  that  enable  analysts  to  use  OLHs  and 
NOLHs  in  more  situations.  Furthermore,  we  specify  the  conditions  under  which  the 
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various  approaches  are  appropriate.  We  also  develop  algorithms  for  accommodating 
discrete  variables  into  nearly  orthogonal  designs.  Our  research  goals  are  listed  below. 

•  Provide  analysts  the  ability  to  apply  uncorrected  LH  designs  to  a  number 
of  situations  in  which  scientists  do  not  have  the  advantage  of  developing 
OLH  or  NOLH  designs,  or  may  not  wish  to  use  OLH  or  NOLH  designs. 

•  Provide  analysts  the  ability  to  generate  OLH  and  NOLH  designs  for  many 
more  combinations  of  n  and  k  than  previous  approaches  allowed — with  a 
particular  emphasis  on  maximizing  k  for  any  given  n. 

•  Provide  analysts  the  ability  to  incorporate  different  factor  types  (discrete 
and  continuous)  with  different  value  levels  (mixed  factor,  mixed  level) 
into  a  nearly  orthogonal  design  matrix. 

•  Provide  analysts  the  ability  to  quickly  generate  new,  robust,  special 
purpose  designs  to  meet  new  and  changing  design  requirements. 

The  following  four  chapters  each  describe  techniques  to  reach  our  study  goals. 
Chapter  II  discusses  RLHs  more  thoroughly,  to  provide  the  reader  with  a  better 
understanding  of  their  utility  and  to  build  a  foundation  for  other  techniques.  It  contains 
the  mathematical  theory  underlying  our  designs  and  the  details  necessary  to  construct 
them.  We  discuss  pmap  as  the  nonorthogonality  measure  that  is  our  focal  point  for  all 

design  techniques.  We  create  tools  based  on  pmap  to  select  dimensions  for  an  RLH 

design  that  has  a  very  good  chance  of  possessing  an  acceptable  nonorthogonality 
measure.  Chapter  III  quantifies  the  effectiveness  of  Florian’s  (1992)  correlation 
reduction  method,  as  well  as  its  limitations,  for  producing  OLH  and  NOLH  designs. 
Florian’s  method  is  an  earlier  technique  that  has  great  utility  in  constructing  a  number  of 
nearly  orthogonal  designs  that  do  not  confonn  to  previous  restrictions  in  their 
dimensions.  Chapter  IV  contains  a  combined  application  of  RLH  generation,  Florian’s 
(1992)  method,  and  mixed  integer  programming  (MIP)  to  produce  new  OLH  and  NOLH 
designs  that  greatly  expand  on  what  was  previously  available.  The  difficulty  in  using 
optimization  to  solve  combinatorial  problems,  such  as  DOEs,  prevents  scientists  from 
pursuing  it  as  a  practical  solution  method.  We  formulate  the  MIP  problem  to  make 
optimization  a  viable  option.  Application  of  these  combined  techniques  results  in  a  new 
family  of  designs  that  we  call  saturated  nearly  orthogonal  Latin  hypercubes  (S-NOLH). 
We  discuss  in  detail  the  advantages  that  this  family  of  designs  offers  to  scientists. 
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Chapter  V  considers  mixed-factor,  mixed-level  experiments  that  include  (perhaps  many) 
discrete  variables  that  do  not  necessarily  have  the  same  number  of  possible  values.  The 
mixture  of  these  variables  utilizes  a  systematic  process  for  leveraging  OLH  and  NOLH 
designs  constructed  using  other  techniques.  We  apply  crossed  designs,  column 
pennutation,  stacking  methods,  and  logical  lines  of  processes  to  produce  designs  that 
retain  much  of  the  orthogonal  properties  of  the  base  design.  The  base  design  is  the  set  of 
continuous  variables  from  which  we  create  an  OLH  or  NOLH  design,  as  articulated  from 
techniques  in  earlier  chapters. 

Chapter  VI  demonstrates  a  case  study  for  our  designs.  We  survey  a  number  of 
studies,  over  a  variety  of  applications,  which  use  our  new  designs.  Finally,  Chapter  VII 
summarizes  our  conclusions  in  this  research  area  and  provides  directions  for 
future  research. 
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II.  SHAPING  CONDITIONS  FOR  APPLYING  UNCORRECTED 
RANDOM  LATIN  HYPERCUBES 


Latin  hypercubes  (LHs)  are  one  of  the  most  commonly  used  designs  for  computer 
experiments  (Kleijnen  et  ah,  2005).  However,  the  regularity  in  which  high  correlation 
occurs  in  some  of  these  designs  complicates  analysis  of  the  results.  Efforts  to  reduce  or 
eliminate  correlation  are  often  computationally  and  monetarily  expensive,  require  special 
purpose  software,  and  can  be  time  consuming.  Many  scientists  frequently  use 
uncorrected4  random  LH  (RLH)  designs  in  their  experiments  and  attempt  to  work  with 
the  inefficiencies  that  may  result  from  them. 

We  develop  tools  and  techniques,  based  on  the  maximum  absolute  pairwise 
correlation,  to  select  an  appropriate  dimension  for  the  experimental  design  and  obtain  an 
uncorrected  RLH  with  acceptable  correlations.  This  method  requires  no  specialized 
algorithms  or  software  packages,  making  it  applicable  in  most  situations.  The  speed  with 
which  our  method  can  be  employed  makes  it  an  attractive  option  for  experimenters  with 
limited  resources. 

We  propose  an  approach  to  find  designs  with  acceptable  correlations  among  their 
columns:  dimension  selection  and  RLH  generation.  Current  methods  to  minimize 
correlation  in  LH  designs  generally  fall  into  two  broad  categories:  (1)  those  that 
transform  an  original  design  to  have  an  acceptable  degree  of  correlation,  or  (2)  those  that 
eliminate  correlation  during  construction  of  the  design  columns.  Our  technique  falls  into 
neither  category.  We  accept  the  natural  proclivity  of  RLH  production  to  frequently  have 
high  correlation  among  the  columns  and  use  the  randomness  in  column  generation  to 
our  advantage. 

We  assert  that  astute  selection  of  a  design  dimension  (n  =  sample  runs, 
k  =  factors)  and  a  predictive  model  for  an  RLH’s  expected  maximum  absolute  pairwise 
correlation  will  often  allow  an  experimenter  to  quickly  generate  an  acceptable  design. 
Our  method  takes  advantage  of  the  ease  with  which  an  RLH  design  is  created.  Coupling 
rapid  RLH  generation,  as  we  described  in  Chapter  I,  with  the  degree  to  which  we  can 

4  An  uncorrected  LH  is  one  to  which  no  correlation  reduction  method  is  applied. 
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predict  the  maximum  absolute  pairwise  correlation,  our  procedure  systematically  directs 
the  selection  of  an  appropriate  design  dimension  and  provides  guidance  for  accepting  an 
uncorrected  RLH.  Because  this  method  requires  no  manipulation  of  the  RLH,  any 
investigator  can  employ  it  with  common  software  and  computing  tools. 

A.  TOOLS  FOR  DIMENSION  SELECTION  AND  RANDOM  LATIN 
HYPERCUBE  GENERATION 

We  earlier  established  p  as  our  nonorthogonality  measure  for  an  LH  design. 

We  use  this  measure  to  select  an  RLH  design  with  an  acceptable  degree  of 
nonorthogonality.  First,  we  use  p  as  a  guide  to  select  an  appropriate  design  dimension 

for  an  experiment.  Second,  with  the  design  dimension  fixed,  the  nonorthogonality 
measure  serves  as  a  threshold  for  selecting  an  acceptable  design  from  iterative 
generations  of  RLH.  This  section  describes  the  development  of  correlation-based  tools 
and  the  procedures  for  their  application. 

We  use  these  p  -  based  tools  when  more  complex  algorithms  associated  with 

correlation  reduction  methods  are  not  readily  available  to  the  experimenter.  The  ease  of 
generating  an  LH  with  comparatively  unsophisticated  means  makes  them  an  attractive 
option  (Sugiyama  &  Chow,  1997)  for  creating  experimental  designs.  If  a  random 
generation  of  an  LH  has  acceptable  correlation  among  its  columns,  an  experimenter  can 
reap  the  benefits  from  an  efficient  design  with  little  computational  effort.  Cioppa  (2002) 
defines  efficient  experimental  designs  as  those  which  (i)  detect  as  many  significant 
variables,  nonlinear  effects,  interactions,  and  their  associated  ranges  as  possible, 

(ii)  declare  significant  as  few  non-significant  variables  and  interactions  as  possible,  and 

(iii)  accomplishes  (i)  and  (ii)  with  a  minimal  number  of  runs.  This  chapter  introduces  the 
mechanism  to  create  experimental  designs  that  possess  acceptable  levels  of 
nonorthogonality  from  uncorrected  RLHs. 

1.  Creating  the  /?””  Table 

We  first  create  a  series  of  correlation  tables  that  combine  design  dimensions, 
which  correspond  to  known  OLH  and  NOLH  designs  (Ye,  1998  and  Cioppa,  2002),  and 
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the  number  of  RLHs  to  generate  from  which  to  select  a  design.  Each  entry  in  the 
correlation  table  is  an  average  of  the  minimum  p  for  a  given  n  and  k. 

We  begin  our  work  with  an  initial  set  of  data  that  consists  of  42  n  by  k  design 
combinations.  Using  Cioppa’s  (2002)  convention  for  design  dimensions,  we  explore 

f  m  —  \\ 

combinations  of  n  =  2"'+l  for  m  =  1,...,8  and  k  =  m  +  for  m  =  1, _ ,16. 

V  2  ) 

Therefore,  we  examine  design  dimensions  as  small  as  {n  =  17,  k  =  7}  and  as  large  as 
{n  =  257,  k  =  121}.  We  consider  only  those  designs  with  n  >  k.  Designs  where  the 
number  of  runs  is  less  than  the  number  of  factors  produce  unstable  models  (Leon,  2002). 
We  enter  NA  (Not  Applicable)  for  n  <  k  combinations  in  the  tables. 

The  correlation  tables  also  account  for  the  number  of  RLHs  to  generate,  and  from 
which  to  select  an  acceptable  design.  We  let  G  be  the  number  of  designs  from  which  to 
select  our  experimental  plan.  Tang  (1998)  uses  a  similar  procedure,  comparing  several 
designs  before  selecting  one  to  apply  correlation  reduction  methods.  Other  scientists 
consider  many  more  designs  before  choosing  one  for  an  experimental  design.  To  create 
the  correlation  table  we  generate  200  (annotated  as  G200)  uncorrected  RLH  designs  for  a 
specific  n,  k  combination  and  compute  p  for  each.  From  this  set  of  generated  designs 

we  record  the  smallest  p  ,  which  we  designate  as  p'™p .  We  repeat  this  process  1,000 

times  for  each  n  by  k  combination  and  compute  the  average  of  the  1,000  p'™  values, 

P'map  •  This  value  is  the  entry  in  the  table  for  the  specific  design  dimension  and  estimates 

the  corresponding  expected  p'™"  among  200  uncorrected  RLH  designs.  Examination  of 

these  estimates  show  low  variability,  as  one  expects  from  extreme  statistics  (David  & 
Nagaraja,  2003),  and  acceptable  precision  for  our  work. 

The  different  design  dimensions  vary  greatly  in  the  values  of  their  ,  but  the 
largest  sample  standard  deviation  for  any  specific  design  dimension  is  0.025  (Figure  3). 
Therefore,  the  standard  deviation  of  /?”“  for  any  of  these  design  dimensions  will  also  be 

very  small,  considering  that  the  average  was  a  computation  from  1,000  independent 
observations.  Figure  3  shows  that  the  largest  standard  deviation  occurs  for  a  small  design 
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n  —  17,  k  =  16,  corresponding  to  p™°  =0.306.  The  smallest  standard  deviation  is  for  a 

large  RLH,  n  =  257,  k  =  106  corresponding  to  p™"  =  0.204 .  The  majority  of  these  have  a 
standard  deviation  that  is  less  than  0.01,  so  predictions  will  be  meaningful. 


Figure  3.  The  sample  standard  deviation  of  p™’'  for  1,000  G200  n  and  k  combinations 

is  relatively  small  compared  to  its  p™“  value.  The  largest  standard  deviation 
occurs  in  small  designs,  while  the  smallest  deviations  are  for  larger  designs. 

Table  4  is  the  correlation  table  for  different  design  combinations  for  G200.  Its 

organization  shows  that  as  the  number  of  runs  for  a  fixed  k  increases,  p™”  decreases, 

which  is  consistent  with  Owen  (1994).  Conversely,  p,1™  increases  as  k  increases  for  a 

fixed  n.  The  table  provides  the  design  combinations  which  are  likely  to  produce  a  design 
with  the  required  pmap  by  generating  200  RLFIs.  If  the  table  indicates  that  the  initial 

design  dimension  cannot  attain  the  desired  correlation  within  200  RLFIs,  then  the  table 
guides  the  experimenter  to  increase  n,  decrease  k,  or  both,  to  have  a  reasonable  chance  of 
obtaining  an  RLFI  with  an  acceptable  p  .  This  tool  allows  the  experimenter  to 

ascertain  a  realistic  expectation  of  p  for  a  given  design  dimension. 


18 


Table  4.  This  portion  of  the  G200  table  shows  the  /?“”  for  different  design 

combinations  where  the  number  of  designs  generated  in  each  trial  is  200.  We 
enter  not  applicable  (NA)  for  design  combinations  where  n  <  k. 


G200 

K7 

Kll 

K16 

K22 

K29 

K37 

K46 

K56 

K67 

K79 

K92 

K106 

K121 

N17 

0.3066 

0.4188 

0.4932 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

N33 

0.2108 

0.2931 

0.3503 

0.3923 

0.4267 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

N65 

0.1485 

0.2073 

0.2481 

0.2793 

0.3038 

0.3244 

0.3416 

0.3567 

NA 

NA 

NA 

NA 

NA 

N129 

0.1044 

0.1460 

0.1756 

0.1982 

0.2161 

0.2310 

0.2432 

0.2544 

0.2638 

0.2721 

0.2799 

0.2869 

0.2931 

N257 

0.0738 

0.1032 

0.1240 

0.1403 

0.1530 

0.1638 

0.1728 

0.1805 

0.1871 

0.1935 

0.1988 

0.2038 

0.2082 

Table  usage  is  straightforward.  Suppose  an  analyst  wishes  to  explore  20  factors, 
while  maintaining  p  <  0.20 ,  but  is  uncertain  to  the  number  of  runs  to  perform.  The 

shaded  block  of  cells  in  Table  4  indicates  that  it  is  possible  for  the  experimenter  to  obtain 
an  RLH  with  an  acceptable  pmap  within  200  RLHs  if  the  design  consists  of  somewhere 

between  65  and  129  runs.  The  table  also  guides  the  analyst  to  consider  fewer  factors. 
Additionally,  we  note  that  randomly  generating  an  NOLH  is  unlikely. 

One  could  also  increase  G.  However,  empirical  study  shows  that  increasing  G, 
even  to  1,000,  gives  only  marginal  improvement  from  G200.  Table  5  gives  the  standard 
deviations  of  G1000  RLHs  for  design  combinations  of  n  =  17  and  257  with  k  =  l  and  106 
and  compares  them  to  those  for  G200  RLHs.  The  greatest  difference  in  standard 
deviations  in  G1000  and  G200  designs  is  for  n  =  17,  k  =  7,  where  G1000  is  less  than 

G200  by  0.0039.  For  the  magnitudes  of  p'™"p  shown  in  Table  4,  0.0039  is  small. 


Table  5.  We  show  a  few  different  design  combinations  to  compare  the  standard 
deviations  of  1,000  G1000  and  G200  RLHs.  The  greatest  difference  in  standard 
deviations  between  G1000  and  G200  is  for  the  design,  n  =  17,  k  =  7,  where 
GlOOO’s  standard  deviation  is  smaller  by  0.0039. 


G1000 

k7 

kl06 

nl7 

0.0215 

NA 

n257 

0.0053 

0.0028 

G200 

k7 

kl06 

nl7 

0.0254 

NA 

n257 

0.0066 

0.0035 
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Likewise,  the  magnitude  of  p™”  for  G1000  and  G200  in  Table  6  shows  that  the 

differences  are  very  close  to  the  standard  deviations.  We  conclude  that  G200  is  a 
reasonable  number  of  RLHs  from  which  to  choose  the  design  with  the  best  observed  p 
since  the  improvements  from  a  sample  size  that  is  five  times  larger  are  small. 

Table  6.  A  comparison  of  /?“”  for  1000  G1000  and  G200  shows  that  the 
differences  are  within  the  magnitude  of  the  standard  deviations  shown  in  Table  5. 


G1000 

k7 

kl06 

nl7 

0.2737 

NA 

n257 

0.0652 

0.1994 

G200 

k7 

kl06 

nl7 

0.3066 

NA 

n257 

0.0738 

0.2038 

The  general  guidance  described  in  this  section  frames  the  design  dimensions  that 
are  necessary  to  fulfill  the  analyst’s  need.  However,  this  approach  only  provides  a  rough 
estimate  of  the  design  dimensions.  We  develop  another  tool  to  specify  n  and  k. 

2.  A  Function  to  Predict  p™'anp 

Our  goal  is  to  develop  an  equation  that  uses  the  values  of  n  and  k  to  predict 
from  200  RLH  designs.  The  objective  is  an  equation  that  is  sufficiently  simple  to  use 
with  a  calculator.  Using  data  from  the  table  with  the  p™"p  formula  allows  the 

experimenter  to  enter  different  (n,  k)  pairs  until  a  specific  design  dimension  meets  an 
acceptable  nonorthogonality  value. 

a.  Initial  Analysis  of  p'f" 

We  use  the  p"™  data  from  which  we  developed  the  tables  in  the  previous 
section  to  construct  a  parsimonious  predictive  equation.  Patterns  are  evident  in  the 
relationships  between  and  ( n ,  k)  when  either  n  or  k  is  constant  and  the  other 
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changes.  Grouping  p™'"  values  based  on  n  uncovers  clear  patterns  in  the  data.  Figure  4 

shows  that  as  k  increases  and  n  is  constant,  the  relationship  between  p""''p  and  k  appears 

logarithmic.  The  overlaps  in  the  groupings  of  n  as  k  increases  also  indicate  an  interaction 
between  n  and  k. 


Figure  4.  A  logarithmic  pattern  appears  between  p™"  and  k  when  n  is  constant: 
Groups  of  n  are  presented  in  increasing  order. 

Similarly,  Figure  5  displays  an  emerging  pattern  when  k  is  constant  and  n 

increases.  The  relationship  between  and  n  displays  an  exponential  decay.  These 

tendencies  for  p™"  regarding  n  and  k,  respectively,  provide  a  clue  to  their  transformation 
in  order  to  draw  a  linear  relationship  with  each. 
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n 


Figure  5.  An  exponential  decay  or  a  negative  power  function  relationship  appears 
between  and  n  when  k  is  constant.  For  clarity,  only  a  few  sets  for  fixed 
k  are  in  the  chart. 

Transforming  n  and  k  results  in  each  having  a  nearly  linear  relationship 
with  p™'"p  .  Owen  (1994)  shows  that  the  variance  of  the  root  mean  square  correlation 
( Prms )  of  an  uncorrected  LFI  design  is  related  to  n  ! .  We  examine  different 
transformations  of  n  to  detennine  its  linear  relationship  with  .  A  transformation  of 
n  2/3  shows  a  nearly  linear  relationship  when  k  is  constant.  Figure  6  shows  an  example  of 
a  nearly  linear  relationship  between  p™n  and  n  2  '  when  k=l. 
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Figure  6.  There  is  a  nearly  linear  relationship  between  and  transformed  n.  For 
illustration  we  show  the  graph  for  k  =  7. 

Similarly,  a  transformation  of  k  to  k  1  3  reveals  a  nearly  linear  relationship 
with  ,  when  keeping  /?  constant.  Figure  7  shows  an  example  of  a  nearly  linear 

relationship  between  p™"  and  /i  |  !  for  a  set  of  data  where  /?  =  257. 


Figure  7.  A  nearly  linear  relationship  appears  between  and  transformed  k.  We 
illustrate  the  case  for  «  =  257. 
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Owen  (1994)  studies  the  relationship  between  the  root  mean  square  of  an 
uncorrected  Latin  hypercube  sample  (LHS)  and  n,  as  n  increases.  Owen  shows 
empirically  that  the  relationship  between  the  prms  of  an  uncorrected  LHS  and  n  to  be 

l°g  ( A™ )  =  -0.5  log  (n) . 

Owen’s  results  support  the  exponentially  decaying  relationship  that  we 
find  between  p'f"  and  n.  We  note  that  this  equation  is  independent  of  k.  To  complete 

our  study,  we  explore  the  simultaneous  relationships  of  n  and  k  with  p'™p  . 

b.  Exploring  Simple  Linear  Regression  Models  for  p'ff 

Examination  of  the  relationship  between  /?““  and  the  two  transformed 
variables,  n  2/3  and  A:-13,  reveals  that  it  is  possible  to  fit  a  multiple  linear  regression 
model  to  the  data.  A  study  of  simple  linear  relationships  between  p'™p  and  each  of  the 
transformed  values  shows  that  they  are  nearly  linear.  Because  of  an  earlier  indication 
that  an  interaction  exists  between  n  and  k,  we  also  explore  the  relationship  between 

and  h~2,3AT1/3  .  Fitting  simple  linear  regression  models  of  p'™p  versus  n  2  3 ,  AT13,  and 
n  V2k  1,3  results  in  each  model  having  an  R1  of  0.98  or  greater. 

c.  A  Multiple  Linear  Regression  Model  for  p'fp 

The  above  results  indicate  that  a  multiple  linear  regression  model  with  two 
main  effect  terms  and  one  interaction  term  can  be  a  good  predictor  for  p"'fp  .  To  fit  a 

multiple  linear  regression  model  we  use  p'fp  data  from  1 15  different  design  dimensions. 
The  least  squares  fit  gives  an  R2  =  0.99  .  Our  estimated  predictor,  which  we  denote  by 
(p'z)  ,  is  given  by 

(  pZp  f  =  °-0873 + v.859 >f 2/3  -  o.io9r1/3  - 1 1  .702a  2/3r1/3 . 
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d.  Adequacy  of  the  Multiple  Linear  Regression  Model  for  p'f"p 

The  multiple  linear  regression  model  fit  adequately  estimates  expected 
P'map  •  The  statistics  from  the  predicted  values’  residuals  are  given  in  Table  7.  Of  course, 

the  residual  mean  is  zero.  A  standard  deviation  of  0.00966  and  a  range  of  0.05  are 
relatively  small  for  the  values  that  the  model  is  predicting.  The  median  and  the  skewness 

value  indicate  that  the  model  tends  to  slightly  underestimate  the  actual  /?”“  values. 


Table  7.  Descriptive  statistics  from  the  residuals  derived  from  the  MLR  predictions 
show  negative  skewness  and  a  relatively  small  range.  The  model  tends  to  slightly 
underestimate  the  value  of  the  minimum  maximum  absolute  pairwise  correlation 

value  of  200  RLH  designs. 


Statistics  for  Residuals 

Median 

0.0029624 

Standard  Deviation 

0.0096594 

Kurtosis 

0.8707658 

Skewness 

-1.0013506 

Range 

0.0501953 

Minimum 

-0.0364850 

Maximum 

0.0137103 

Count 

115.0000000 

It  is  evident  from  Figure  8  that  there  are  some  curvilinear  tendencies  in  the 
residual  plot  and  the  residuals  do  not  follow  a  normal  distribution,  but  the  residuals  are 
not  overly  large  with  respect  to  the  predicted  values.  Another  model  that  includes 
quadratic  terms  improves  the  predictive  capabilities  of  the  model,  but  requires  a  more 
complex  expression  (see  Appendix  E).  Considering  the  parsimony  that  we  desire  and  the 
relatively  small  errors,  we  accept  this  model. 
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Residuals  vs.  pmin 


Predicted  n 

r  map 


Figure  8.  This  residual  plot  of  1 15  different  predicted  p™'",  values  shows  curvature. 
e.  Application  of  the  (/C)  Mode! 

We  now  discuss  the  use  of  the  |  p'™p  j  equation  to  estimate  expected 

in  conjunction  with  the  p'™p  table.  The  table  provides  general  guidance  for 
the  range  of  n  and  k  in  which  a  specific  correlation  value  may  reside.  Because  of  its 

simplicity,  the  |  p'™p  j  equation  can  easily  be  incorporated  into  a  spreadsheet  to  compute 

estimated  values  of  p'ff ,  as  the  experimenter  varies  n  and  k.  Calculations  stop  when  the 

experimenter  finds  an  n  and  k  combination  that  results  in  an  acceptable  correlation  value. 
These  are  the  design  dimensions  that  afford  the  analyst  the  best  chance  of  finding  an 
uncorrected  RLFI  with  the  desired  pmap .  Because  the  equation  is  sufficiently  simple,  one 

can  also  solve  for  any  of  the  three  values,  when  the  other  two  are  fixed. 

To  produce  the  design  for  designated  n  and  k  we  generate  an  RLH  and  compute 
p  .  The  experimenter  may  use  any  number  of  software  programs  to  produce  RLFI 
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designs  and  to  measure  p  .  Within  G  generations  of  RLH  designs,  we  assert  that  the 
analyst  will  find  a  design  with  an  acceptable  p  . 


B.  DESIGN  DEVELOPMENT  WITH  CORRELATION-BASED  TOOLS 

We  employ  the  correlation-based  tools  that  we  have  developed  in  Section  A.  A 
step-by-step  procedure  guides  experimenters  in  implementing  these  tools.  A  constructive 
example  illustrates  the  leverage  that  experimenters  can  gain  from  RLH  designs. 

1.  The  Process  for  Dimension  Selection  and  RLH  Design  Generation 

Our  procedure  for  obtaining  an  experimental  RLH  design  with  an  acceptable 
nonorthogonality  measure  has  the  following  steps. 

Step  1:  Enter  the  p™n  table  to  obtain  general  guidance  about  design  dimensions.  Entry 
into  the  table  may  occur  in  one  of  four  ways: 

•  The  experimenter  may  have  a  correlation  threshold  for  a  specific  number 
of  factors. 

•  The  experimenter  may  have  a  limit  for  the  number  of  sample  runs  and  a 
correlation  threshold. 

•  The  experimenter  may  have  a  correlation  threshold  and  an  idea  for  the 
design  dimensions. 

•  The  experimenter  may  have  a  correlation  threshold  with  little  or  no  idea 
about  the  design  dimensions. 

Step  2:  From  the  table,  select  the  range  of  n  and  k  based  on  the  correlation 

threshold.  If  required,  rough  interpolation  within  the  table  values  is  sufficient  to  define 
the  ranges. 

Step  3:  Establish  a  spreadsheet  to  compute  |  p™'"p  j  for  each  design  combination. 

Step  4:  Calculate  values.  Best  practices  reveal  that  fixing  k  first,  while 

changing  n,  is  a  good  approach. 
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Step  5:  Stop  searching  combinations  when  a  value  of  j  that  approximately  meets 
the  correlation  threshold  is  calculated. 

Step  6:  Record  the  design  combination  (n,  k )  that  meets  the  correlation  threshold. 

Step  7:  Fix  the  design  dimension  and  generate  up  to  G  number  of  RLH  designs. 
Measure  p  for  each  new  design  and  stop  when  the  actual  p  is  satisfactorily  close  to 

the  correlation  threshold  or  all  G  designs  have  been  generated.  Note  that  this  does  not 
guarantee  that  the  threshold  is  met. 

Following  these  steps,  an  experimenter  can  obtain  an  acceptable  design  without  a 
large  investment  in  computational  costs  and  time.  We  designate  the  resulting  RLH 
design  as  R"k  .5 

2.  A  Hypothetical  Case  Requiring  an  Uncatalogued  Design  Dimension 

This  example  clarifies  the  steps  for  determining  the  design  dimensions  and 
generating  an  acceptable  RLH.  An  experimenter  seeks  an  experimental  design  to  assess 
k  =  20  factors  with,  at  most,  n  =  100  runs.  Since  a  readily  available  N^0  or  O\°0°  design 
does  not  exist,  the  researcher  could  try  our  methodology.  The  experimenter  determines 
that  an  experimental  design  with  pmap  <  0.22  is  acceptable.  To  get  an  initial  idea  if  a 

design  that  the  scientist  needs  is  possible,  we  enter  Table  8  with  correlation  value,  0.22, 
k  =  20,  and  n  =  100.  Because  there  are  20  factors  to  assess,  the  experimenter  looks  down 
the  columns  of  K16  and  K22,  which  bracket  k  =  20.  The  shaded  part  of  Table  8  shows 
that  the  RLH  design  that  can  meet  the  correlation  threshold  will  have  between  65  and  129 

runs,  because  the  values  at  these  two  sample  run  values  contain  the  desired  pmap . 

Since  the  number  of  factors  is  fixed,  k  =  20,  the  experimenter  is  concerned  with  the  range 
65  <  n  <  100  . 


5  We  pattern  this  naming  convention  similar  to  Cioppa  and  Ye:  Cioppa  (2002)  designated  NOLH 
designs  as  N"k  and  Ye  (1998)  designated  OLH  designs  as  0\.  The  symbol  Rk  emphasizes  its 
random  nature. 
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Table  8. 


This  table  is  a  section  of  a  G200  table.  It  has  values  for  dimensions 

with  k  =  22  or  less. 


G200 

K7 

Kll 

K16 

K22 

N17 

0.3066 

0.4188 

0.4932 

NA 

N33 

0.2108 

0.2931 

0.3503 

0.3923 

N65 

0.1485 

0.2073 

0.2481 

0.2793 

N129 

0.1044 

0.1460 

0.1756 

0.1982 

N257 

0.0738 

0.1032 

0.1240 

0.1403 

After  approximating  the  design  dimensions,  the  experimenter  uses  the  equation 

[pZ  f  =  °-°873  +  7.859«-2/3  -  °.  i 09k  ^  -  i  i  .702,T2/3r 1/3  to  compute  values  from 

different  design  combinations  until  one  is  satisfactorily  close  to  the  correlation  threshold. 
As  we  noted  earlier,  if  the  experimenter  fixes  two  of  the  values  in  the  equation,  the  third 
may  be  found.  We  recognize  that  this  equation  provides  an  estimate  of  the  expected 
P'ma,,  •  Since  the  standard  deviation  is  small,  the  actual  will  usually  be  close  to  this 

average  value.  However,  about  half  of  the  time  /?””  will  be  above  it.  If  the  requirement 
is  strict,  we  can  repeat  the  process  several  times  with  a  high  probability  (approximately 
1  -(0.5)#,  m/s  )  of  obtaining  the  threshold.  Another  approach  is  to  use  a  conservative  value 
of  the  target  p'™p . 

The  experimenter  holds  k  =  20  constant  and  incrementally  increases  n.  Table  8 
shows  that  at  n  =  65,  the  correlation  is  between  0.2481  and  0.2793.  The  spreadsheet 
results  in  Table  9  show  that  an  design  can  have  a  pmap  that  is  approximately  0.22.  In 

this  case,  it  is  possible  to  find  a  design  that  does  not  require  all  100  sample  runs. 
Depending  on  the  cost  of  each  run,  the  experimenter  can  extend  the  budget. 
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Table  9.  Calculations  for  using  the  MLR  model  as  n  changes. 


It 

k 

/  rninf 

[  Pmap  j 

81 

20 

0.236 

83 

20 

0.233 

85 

20 

0.230 

87 

20 

0.228 

89 

20 

0.225 

91 

20 

0.222 

To  validate  the  estimate  for  p  for  the  design,  we  use  R,  a  free  software 

version6  of  the  statistical  package  S-Plus,  to  generate  200  R^0  designs,  while  recording 

the  pmup  values.  We  successfully  produce  an  RLH  design  with  p  equal  to  0.219.  It 

may  be  possible  to  increase  n  to  93,  but  the  current  result 
is  acceptable. 

Any  experimenter  can  follow  the  process  that  we  have  described  to  construct  a 
design  that  does  not  exist,  or  when  there  is  a  lack  of  computing  power  or  software.  An 
experimenter  may  also  not  require  an  orthogonal  design  and  desire  some  correlation 
(Iman  &  Shortencarrier,  1985).  Iman  and  Conover  (1982)  discuss  ways  to  induce 
correlation,  but  RLH  designs  are  an  available  alternative.  Additionally,  the  new  design 

dimensions  that  these  tools  provide  are  advantageous.  The  combination  of  p™np  tables, 

equations,  and  RLH  generation  give  researchers  a  simple,  fast,  and  effective 

means  to  construct  experimental  designs  with  acceptable  nonorthogonality.  Of  course, 
this  procedure  requires  a  good  random  number  generator  (Law  &  Kelton,  1999). 

C.  EFFECTS  OF  FEWER  CANDIDATE  DESIGNS  (G) 

There  may  be  cases  in  which  the  experimenter  does  not  wish  to  generate  G  =  200 
RLHs  before  selecting  a  suitable  design.  The  manner  in  which  the  experimenter 

6  R  is  freeware.  The  interested  reader  can  contact  the  author  or  check  the  SEED  Center  for  Data 
Farming  website  at  http://harvest.nps.edu  for  the  R  code. 


30 


generates  RLH  designs  may  also  be  a  constraint.  To  mitigate  the  impact  of  such 
circumstances,  we  provide  p“”  tables  and  |  p”“  j  expressions  for  different  values  of  G. 

We  introduce  the  symbol  ( p™" )  as  the  smallest  maximum  absolute  pairwise  correlation 
value  in  G  trials.  The  corresponding  equation  to  estimate  the  expected  smallest 
maximum  absolute  pairwise  correlation  value  in  G  iterations  is  designated  as  |  p™“  j  . 

Investigating  the  impact  of  different  values  of  G  reveals  two  notable  observations. 
We  empirically  show  that  for  G  >  200,  the  values  of  p™np  vary  only  slightly.  Conversely, 

as  G  decreases,  the  variance  in  p”“  increases,  but  not  overly  so.  As  a  result,  the 

expressions  show  similarity.  However,  the  differences  are  important.  To  retain  utility  to 
experimenters,  we  set  the  lower  bound  for  G  at  10  and  investigate 
G  e  {10,  25,  50,  75,  100,  125,  150,175,200}. 

With  some  slight  modifications  we  use  the  same  methodology  to  explore  the 
relationship  of  transformed  n  and  k  values,  as  well  as  their  interaction  tenn.  We  fit 

multiple  linear  regressions  for  each  G.  We  catalogue  j  p™  j  tables  and  corresponding 
|p”“j  models.  In  our  initial  effort,  we  use  42  design  dimensions  to  develop  the 
model.  We  refine  the  ( p™“ )  model  fit  using  all  115  design  combinations 

G200  V  p  /  G200 

and  further  use  these  data  to  develop  the  equations  that  follow. 


(  min  | 
r  map  I 


P, 


map 


G175 


Pm 


G150 


=  0.0873  +  7.859>r2/3  -  0. 109A:  1/3  - 1 1 ,702>r2/3r1/3 

=  0.0873  +  7.864>r2/3  -  0. 109A  1/3  - 1 1 .682 A2/3r1/3 

=  0.0874  +  7.870/r2/3  -  0. 108AT 1/3  - 1 1 .650 »“2/3r 1/3 

=  0.0875  +  7.872n~2/3  -0.1 07AT 1/3  -11.61  fir2/3r 1/3 


(2.1) 

(2.2) 

(2.3) 

(2.4) 


G125 
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=  0.0875  +  7.883>r2/3  -  0.106^  1/3  - 1 1 .578 n2nk~m 

0.0877  +  7.886 >r2/3  -  0.105^~I/3  - 1 1 .502 nink~m 

0.0877  +  7.912tT2/3  -  0. 103/c  1/3  - 1  1 .423/7  2/3/c  1/3 

0.088 1  +  7.945m~2/3  -  0.0988/t  1/3  - 1 1 .270«~2/3r1/3 

0.0883  +  7.996h  2/3  -  0.0902A:  1/3  - 1 1 .014  ninh~m 


(2.5) 

(2.6) 

(2.7) 

(2.8) 

(2.9) 


These  equations  have  similarity  in  their  coefficients,  but  the  subtleties  in  each 
expression  are  important.  The  differences  result  in  fairly  precise  estimates  of  expected 
P'ma,,  from  a  given  G  RLHs.  This  set  of  equations  provide  the  experimenter  with  an 
option  for  the  number  of  RLH  designs  to  generate,  along  with  design  choices  of  n,  k,  and 

P map  * 

D.  SUMMARY 


Scientists  often  use  uncorrected  LH  designs  and  deal  with  effects  of  having  high 
correlation  among  the  columns  of  the  design  matrix.  Our  efforts  simplify  the  process  for 
any  experimenter  to  obtain  a  design  that  meets  a  correlation  threshold,  which  may  not 
necessarily  be  orthogonal,  or  nearly  orthogonal.  We  define  the  p  of  an  LH  as  a  single 

measure  of  its  nonorthogonality.  We  describe  tools  based  on  this  measure,  and  present  an 
algorithm  that  combines  them  with  RLH  generation  to  obtain  designs  with  acceptable 

nonorthogonality.  Tables  for  p'™'  for  specific  values  of  G  are  reusable,  as  are  the 

formulas  for  j  p™“  j  .  The  availability  of  packages  that  easily  generate  LHs  and  the 

ubiquity  of  off-the-shelf  analytical  software  offer  scientists  a  powerful,  but  simple,  set  of 
tools  to  obtain  a  desired  experimental  design. 
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III.  CORRECTED  RLH  USING  FLORIAN’S  METHOD  FOR 

NEW  DESIGNS 


This  chapter  explores  the  utility  of  applying  Florian’s  method  to  reduce 
correlation  in  LHs.  We  iteratively  apply  Florian’s  method  to  produce  new  OLH  and 
NOLH  designs.  Cioppa  (2002)  conjectures  that  it  is  necessary  to  generate  a  large  number 
of  candidate  LH  designs  from  which  to  choose  a  few  that  meet  nonorthogonality 
thresholds  before  applying  correlation  reduction  methods.  In  some  cases,  Cioppa 
suggests  a  million  or  more  candidate  designs.  We  explain  the  relationship  of  the  size  of 
the  experiment  and  the  number  of  candidate  RLH  designs  to  generate  before  applying 
Florian’s  method.  A  study  of  this  relationship  reveals  that  the  number  of  candidate 
designs  is  often  much  less  than  1,000  for  even  large  experiments. 

A.  FLORIAN’S  CORRELATION  REDUCTION  METHOD 

This  section  briefly  describes  Florian’s  method  to  understand  its  mathematical 
foundations  (Florian,  1992).  Consider  an  initial  RLH  design,  X,  with  dimension  n  and  k. 
Because  the  number  of  unique  value  levels  for  any  factor  in  an  LH  is  equal  to  the  number 
of  runs  (n)  in  the  design  matrix,  there  are  no  ties.  We  convert  the  design  into  a  matrix  of 
ranks,  R,  with  the  same  dimension  as  the  design  matrix.  The  entries  in  each  column  in  R 
are  the  ranks  of  the  values  for  the  corresponding  variable. 

We  use  the  rank  correlation  matrix  of  R  to  construct  the  transformation  matrix. 
The  rank  correlation  matrix,  T,  consists  of  the  pairwise  correlations  of  the  columns  of  R. 
The  value  is  Spearman’s  coefficient  of  correlation  (Conover,  1999)  for  the  columns 

Rj  and  Rj  for  all  pairs  (/,/)  in  R,  with 

TlJ=1~  “n(n2- 1)  ’ 

where  R\  is  the  rank  of  the  /th  element  in  the  7th  column.  Spearman’s  computation  of  the 
correlation  is  a  modification  of  Pearson’s  sample  coefficient  of  correlation,  based  on 
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ranks  when  there  are  no  ties.  Thus,  it  is  appropriate  for  our  use  since  each  value  in  the 
design  column  is  unique. 

Clearly  T  is  symmetric  and  positive  definite.  Therefore,  it  is  factorable  through 
Cholesky’s  decomposition  (Leon,  2002).  We  seek  a  transformation  matrix,  S  ,  such  that 
STS  is  equal  to  the  identity  matrix,  where  S  =  Q  1  and  T  =  QQ  . 

Determining  Q  involves  Cholesky’s  method.  Because  T  is  symmetric,  a  lower 
triangular  matrix,  L,  exists  such  that  T  =  LDL  ,  where  D  is  a  diagonal  matrix  of  positive 
values.  Cholesky’s  decomposition  states  that  Lx  =LDh2  and  that  LXLX  =T  .  Therefore, 

Q  =  Lx  and  5  =  Q  1  =  L\l  =  ( LD 1/2 ,  so  S'  =  i^LDV2 )  '  J  . 

Florian  uses  S  to  update  R.  The  new  rank  matrix  is  Rnew  =  RS  and  the  factor 

level  values  of  X  are  rearranged  to  meet  the  new  rank  structure,  which  usually  results  in 
reduced  correlation  among  its  columns  (Florian,  1992). 

B.  DIMENSIONALITY  AND  FLORIAN’S  METHOD 

A  Florian-improved  RLH  (FRLH)  design  is  the  result  of  a  relatively  simple 
method  for  creating  uncatalogued  nearly  orthogonal  designs.  We  produce  one  RLH, 
compute  its  pmap ,  and  apply  Florian’ s  method.  We  compute  pmap  for  the  transformed 

matrix  and  compare  the  two  nonorthogonality  measures.  If  there  is  an  improvement,  we 
reapply  Florian’s  method.  We  iterate  until  p  no  longer  improves  and  store  the  last 

improved  design.  Table  10  contains  the  best  pmap  from  20  separate  instances  of  this 

process  for  each  of  many  different  RLH  design  dimensions.  We  highlight  a  number  of 
the  design  dimensions  that  previous  size  restrictions  prevent.  Notably,  we  find  an 
improved  n  =  33,  k  =  16  design  that  has  a  p  of  0.031,  which  meets  Ang’s  (2006) 

NOLH  criteria.  With  Cioppa’s  (2002)  NOLH  convention,  exploring  16  factors  requires 
at  least  65  runs.  Ang  (2006)  reduces  the  number  of  runs  to  33.  Our  unique  n  =  33,k=  16 
design  equals  Ang’s  achievement.  Additionally,  our  method  increases  k  and  introduces  a 
new  n  =  33,  k  =  22  design  with  pmap  equal  to  0.034.  Again  using  Cioppa’s  (2002)  and 
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Ye’s  (1998)  naming  convention,  we  use  the  symbol  F,"  to  denote  an  FRLH  design  with  n 
runs  and  k  factors  since  it  may  not  necessarily  result  in  an  OLH  or  NOLH. 


Table  10.  The  best  maximum  absolute  pairwise  correlation  values  from  20  FRLHs 
of  the  same  design  dimension  are  presented.  Gray-shaded  design  dimensions 
emphasize  intervals  that  this  technique  can  fill  in  the  OLH  and  NOLH  catalogue, 
using  Ang’s  (2006)  near  orthogonal  threshold.  Diagonal-lined  areas  identify 
dimensions  yet  to  meet  NOLH  criteria. 


K 

N 

7 

11 

16 

22 

29 

37 

46 

56 

17 

0.054  0.061] 

0.105 

1  NA 

NA 

NA 

NA 

NA 

25 

0.039 

0.039 

0.053 

f).08Q 

NA 

NA 

NA 

NA 

33 

0.028 

0.030 

0.031 

0.034 

0054 

NA 

NA 

NA 

49 

0.019 

0.018 

0.018 

0.020 

0.022 

0.036 

0.056 

NA 

65 

0.013 

0.014 

0.013 

0.015 

0.015 

0.017 

0.019 

0.039 

97 

0.009 

0.009 

0.010 

0.010 

0.010 

0.010 

0.010 

0.012 

129 

0.007 

0.008 

0.006 

0.007 

0.007 

0.006 

0.007 

0.007 

193 

0.005 

0.005 

0.004 

0.004 

0.005 

0.005 

0.005 

0.005 

257 

0.003 

0.004 

0.003 

0.003 

0.003 

0.004 

0.003 

0.003 

513 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

For  design  dimensions  that  meet  our  guidance,  Florian’s  method  is  often 
sufficient  to  produce  an  NOLH  design  from  a  single  RLH.  Cioppa  (2002)  generates 
millions  of  designs  before  selecting  a  few  on  which  to  apply  Florian’s  method. 
Tang  (1998)  empirically  studies  the  effects  of  initial  designs  when  reducing  correlation. 
Tang  finds  that  for  small  values  of  n  and  k,  the  initial  design  has  a  large  effect  on  the  final 
design’s  correlation  value.  To  mitigate  this  effect,  Tang  generates  three  designs  and 
chooses  the  one  with  the  least  correlation  as  the  initial  design.  As  n  increases,  the  effects 
of  the  initial  design  diminish.  We  provide  empirical  evidence  that,  for  smaller  design 
dimensions,  the  initial  design  has  greater  impact. 

We  produce  many  new  NOLH  designs  with  iterative  applications  of  Florian’s 
method.  While  previous  methods  are  constrained  to  one  exact  design  for  a  given 
dimension,  our  technique  can  result  in  many  NOLH  designs  for  the  same  dimension. 
Additionally,  this  method  results  in  many  new  large  NOLH  designs.  For  instance,  we 
create  an  N]H  design  that  has  pmap  =  0.00267  in  50  minutes  using  a  standard  2  gigahertz 


desktop  computer  with  3  gigabytes  of  RAM. 
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We  further  examine  the  role  that  design  dimensions  play  in  the  effectiveness  of 
Florian’s  method.  Using  only  one  candidate  RLH  for  each  design  dimension,  we 
examine  53  different  design  dimensions  in  which  the  ratio  of  k  and  n  is  less  than  or  equal 
to  0.333.  Of  the  53  designs,  52  met  NOLH  criteria  after  iterative  application  of  Florian’s 
method.  In  all  but  three  of  the  designs,  n  was  greater  than  or  equal  to  49.  After  empirical 
evaluation,  we  develop  a  rule  of  thumb  that  any  design  dimension,  where  n  >  50  and 

fl 

k  <  — ,  likely  results  in  a  design  that  meets  NOLH  criteria. 

A  comparison  of  these  new  F"  designs  with  catalogued  NOLH  designs  provides 
some  gauge  for  the  veracity  of  our  claim  regarding  Florian’s  method.  Table  11  gives 
Pump  before  and  after  application  of  Florian’s  method  to  a  single  RLH.  We  compare  an 

Fjf  design  with  the  actual  pmap  value  of  the  catalogued  NOLH  design.  In  ten  trials,  we 
see  that  all  ten  Fi f  designs  each  have  a  p  that  is  less  than  the  pmap.  We 
acknowledge  that  Cioppa  also  sought  space-fdling  characteristics  for  his  designs.  We 
include  the  F condition  numbers  (CN),  which  also  show  improvement.  Other  trials  for 
different  design  dimensions  yield  similar  results.  Although  not  all  comparisons  favor  the 
F"  designs  as  having  better  properties  than  designs,  they  are  very  similar  in 
measured  values  and  still  meet  nonorthogonality  thresholds. 
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Table  11. 


Comparison  of  pmap  values  and  CNs  for  RLH,  ,  and  . 


Before  Florian 

As 

N65 

JV16 

P  map 

CN 

P map 

CN 

P map 

CN 

Trial  1 

0.3012 

6.7312 

0.0176 

1.0929 

0.0219 

1.1030 

Trial  2 

0.4272 

5.5712 

0.0191 

1.0992 

0.0219 

1.1030 

Trial  3 

0.4176 

7.5441 

0.0149 

1.0852 

0.0219 

1.1030 

Trial  4 

0.3831 

5.9205 

0.0215 

1.1028 

0.0219 

1.1030 

Trial  5 

0.3216 

6.5351 

0.0156 

1.0818 

0.0219 

1.1030 

Trial  6 

0.3475 

6.2600 

0.0157 

1.0853 

0.0219 

1.1030 

Trial  7 

0.3096 

6.6278 

0.0172 

1.0893 

0.0219 

1.1030 

Trial  8 

0.4438 

7.5144 

0.0158 

1.0949 

0.0219 

1.1030 

Trial  9 

0.4882 

7.9720 

0.0179 

1.0911 

0.0219 

1.1030 

Trial  10 

0.3221 

6.7882 

0.0198 

1.0901 

0.0219 

1.1030 

These  results  lead  to  a  loose  constraint  for  applying  Florian’s  method.  When 

fl 

n  >  50  and  k  <  — ,  we  can  usually  transform  RLH  designs  to  meet  Cioppa’s  (2002) 

NOLH  criteria:  p  <  0.03  and  condition  number  less  than  or  equal  to  1.13,  as  well  as 

Ang’s  (2006)  thresholds  of  p  <  0.05  and  condition  number  less  than  or  equal  to  1.20. 

Ang’s  criteria  allow  more  latitude  and  we  therefore  adopt  them  as  our  own.  We  note  that 
a  departure  from  these  guidelines  still  results  in  a  reasonable  reduction  of  the  column 
correlation  in  the  design.  Later  discussions  in  this  study  show  new  methods  that  further 
relax  even  these  loose  constraints. 

C.  CONSTRUCTING  NEARLY  ORTHOGONAL  LATIN  HYPERCUBES 

A  methodical  process  to  generate  NOLH  designs  with  desired  orthogonal 
characteristics  may  be  easily  implemented.  The  first  iteration  of  Florian  results  in  a  new 
design  matrix,  Anew.  However,  p  for  this  new  design  matrix  may  not  be  acceptable 

and  it  may  be  possible  to  improve  Anew  even  further.  Steps  to  iteratively  update  the  latest 
design  to  create  a  design  with  the  smallest  p  follow. 

Step  1:  Generate  an  RLH,  X0m. 

Step  2:  Compute  pmap  of  Xold  and  designate  as  [pmap  . 
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Step  3:  Apply  Florian’s  method  to  X0u  to  create  Xnew. 

Step  4:  Measure  pmap  of  Anew  and  designate  as  (pmap  . 

Step  5:  If  (pmap  )new is  less  than  [pmap  ,  then  rename  Ancw  to  Xold  and  (pmap  )^to 

(Pmap  )0!d  and  return  to  Step  3. 

Step  6:  At  termination,  XM  is  the  best  design. 

Florian’s  method  overcomes  the  dimension  limits  that  other  correlation  reduction 
methods  face.  To  illustrate  the  extent  that  Florian’s  method  fills  the  gaps  in  NOLFI 
design  dimensions,  we  explore  multiple  LH  designs  that  receive  this  treatment.  Table  12 
shows  p  values  for  a  collection  of  new  designs  after  treatment  with  Florian’s  method. 

As  in  previous  discussions,  we  do  not  develop  designs  where  n  <  k  .  As  expected  (Owen, 
1994),  designs  in  which  A:  — >  «  demonstrate  an  increase  in  p  .  This  table  includes  a 

number  of  design  dimensions  that  do  not  exist  in  the  current  catalogue  of  OLH  and 
NOLH  designs.  To  follow  the  guidelines  for  NOLFI  design  dimensions,  a  design  with  22 
factors  (m  =  7)  requires  129  runs.  We  show  that  a  design  for  49  runs  and  22  factors  with 
pmap  of  0.020  can  be  obtained  by  Florian’s  method.  In  fact,  we  catalogue  NOLH  designs 

for  97  and  more  runs,  as  well  as  for  values  of  n  between  these  values. 


Table  12.  These  are  the  best  maximum  absolute  pairwise  correlation  values  from  20 
FRLHs  of  the  same  design  dimension.  These  designs  are  a  continuation  of 
Table  10.  All  of  these  design  dimensions,  as  well  as  the  design  combinations 
between  their  intervals,  fill  many  of  the  gaps  in  the  OLH  and  NOLH  catalogue. 


K 

N 

7 

11 

16 

22 

29 

37 

46 

56 

67 

79 

92 

106 

121 

137 

154 

172 

17 

0.054  0.061  0.105 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

25 

0.039 

0.039 

0.053 

0.080 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

33 

0.028 

0.030 

0.031 

0.034 

0.054 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

49 

0.019 

0.018 

0.018 

0.020 

0.022 

0.036 

0.056 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

65 

0.013 

0.014 

0.013 

0.015 

0.015 

0.017 

0.019 

0.039 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

97 

0.009 

0.009 

0.010 

0.010 

0.010 

0.010 

0.010 

0.012 

0.012 

0.0530.052 

NA 

NA 

NA 

NA 

NA 

129 

0.007 

0.008 

0.006 

0.007 

0.007 

0.006 

0.007 

0.007 

0.008 

0.010 

0.018 

0.030 

0.058 

NA 

NA 

NA 

193 

0.005 

0.005 

0.004 

0.004 

0.005 

0.005 

0.005 

0.005 

0.006 

0.005 

0.005 

0.006 

0.010 

0.012 

0.042 

0.037 

257 

0.003 

0.004 

0.003 

0.003 

0.003 

0.004 

0.003 

0.003 

0.004 

0.004 

0.004 

0.004 

0.006 

0.005 

0.005 

0.012 
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fl 

We  conclude  that,  for  an  experiment  requiring  n  >  50  and  k  <  — ,  we  can  produce 

many  new  NOLH  designs  with  Florian’s  method.  As  Table  12  shows,  we  can  sometimes 
violate  this  rule  of  thumb  and  still  produce  an  NOLH,  such  as  the  n  =  49, 
k  =  22  design  (shaded  in  Table  12)  discussed  previously.  We  also  see  that  as  k 
approaches  n,  Florian’s  method  finds  it  difficult  to  produce  a  design  with  p  <  0.05 , 

such  as  the  n  =  25,  k  =  22  design  that  has  p  =  0.08 .  We  explore  a  new  approach  to 
solve  this  new  aspect  of  the  experimental  design  problem. 

D.  SUMMARY 

The  body  of  work  to  reduce  or  eliminate  correlations  in  LH  designs  is  extensive. 
OLHs  (Ye,  1998)  and  NOLHs  (Cioppa,  2002)  have  earned  reputations  as  highly  useful 
designs.  Historically,  construction  of  these  designs  is  computer  intensive  and  time 
consuming.  Our  findings  show  that  Florian’s  method  and  RLH  generation  can  save  the 
analyst  a  significant  amount  of  time  and  computational  cost.  Moreover,  by  starting  with 
new  RLHs,  we  can  generate  many  such  designs  and  use  other  criteria  (e.g.,  space-filling 
properties  or  higher-order  correlations)  to  discriminate  between  multiple  NOLHs.  Given 

fl 

our  simple  rule  of  thumb,  where  n  >  50  and  k<  —  ,  many  new  NOLH  designs  are 

possible  with  this  technique.  In  many  cases  where  this  rule  of  thumb  does  not  hold, 
Florian’s  method  is  still  effective  enough  to  produce  a  design  that  meets  NOLH 
nonorthogonality  criteria. 
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IV.  A  MIXED  INTEGER  PROGRAMMING  APPROACH  TO 

MINIMIZE  Pmap 


This  chapter  explores  optimization  as  a  means  to  construct  orthogonal  Latin 
hypercube  (OLH)  and  nearly  OLH  (NOLH)  designs,  thereby  further  expanding  the 
situations  in  which  they  can  be  used.  We  are  aware  of  the  difficulties  that  an 
optimization  approach  presents  for  solving  the  experimental  design  problem.  As  n  and  k 
increase,  the  dimensionality  of  LHs  makes  optimization  an  unattractive  option.  As 
Chapter  I  explains,  a  vast  number  of  unique  LHs  are  possible  for  even  moderate  values  of 
n  and  k.  If  an  exhaustive  search  were  applied,  even  moderately  large  values  of  n  and  k 
would  prove  to  be  impossible  to  solve  to  optimality.  For  instance,  to  develop  an  Nff 
design  with  an  optimization  model  that  considers  all  possible  permutations  may  involve 
an  algorithm  that  can  explore  up  to  (33!)1'  LHs  to  guarantee  an  optimal  solution. 

Techniques  such  as  the  simplex  method  (Nash  &  Sofer,  1996)  are  more  efficient  in 
finding  an  optimal  answer  when  all  variables  are  continuous.  However,  nonlinear  and 
integer  methods  run  into  a  “combinatorial  explosion”  (Wolsey,  1998)  that  make  solving 
the  complete  design  of  experiment  problem  computationally  intractable  as  n  and  k  grow 
large.  Therefore,  we  develop  a  construction  methodology  based  on  a  focused 
optimization  routine,  which  greatly  expands  the  range  of  values  for  n  and  k  for  which  we 
can  create  orthogonal  and  nearly  orthogonal  designs. 

We  combine  RLH  generation,  Florian’s  correlation  reduction  method,  and 
optimization  of  a  mixed  integer  programming  problem  to  develop  a  new  algorithm  that 
relaxes  existing  size  restrictions  and  fills  many  gaps  that  exists  in  the  OLH  and  NOLH 
library.  We  continue  to  use  p  as  the  key  measure  for  discriminating  between  designs. 

To  best  illustrate  the  power  of  this  new  methodology,  we  concentrate  on  design 
dimensions  that  are  absent  and  most  difficult  to  generate  with  previous  methods;  that  is, 
experiments  with  n  <  50  and  k  approaches  n. 
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A. 


SIMPLIFYING  THE  COMPUTATION  FOR  NONORTHOGONALITY 


In  explaining  our  optimization  approach  to  address  the  DOE  problem,  we  find  it  is 
more  useful  to  discuss  LHs  in  terms  of  the  ranks  that  correspond  to  each  factor’s  value 
levels.  Because  LH  designs  assume  all  variables  are  continuous,  the  experimenter  simply 
sets  the  number  of  unique  values  to  equal  the  number  of  runs.  Unique  factor  values 
ensure  that  there  are  no  ties  in  the  ranks  of  any  column  in  the  design.  Consequently,  for  a 
design  with  n  runs,  each  column  in  the  LH  design  is  merely  a  permutation  of  integers 
1  to  n. 


Considering  values  as  ranks  from  1  to  n  simplifies  computations.  For  instance, 

_  _  77  +  1 

the  average  value  in  any  column,  /,  is  always  xl  =  x  =  .  The  corresponding  variance 


|  n  '  2  | 

for  any  column  is  of  =  cr2  =  —  'Y  ( x‘  -  x)  = - ,  i.e.,  a  constant,  for  any  n.  The 

n  I=l  12 


12 

1 


covariance  between  columns  /  and  m  is  alm  = - V  (jy  -  x  )  ( x”  -  x ) .  Therefore,  the 

n- itr 


correlation  between  columns  /  and  m  is 


Plm 


cr, 


lm 


n 

Z 

i= 1 


i  n  + 1 

x, - 


v  n+n 

m  rt  i  i 

xi - 

A  2  J 


2  2 

CT I  CT 

l  m 


z- 


!= 1 


77  +  1 


Y 

y 


(4.1) 


For  any  dimension  n  and  k,  we  wish  to  find  a  design  matrix,  Xn  x  k ,  with  the 
smallest  maximum  absolute  pairwise  correlation  p  .  We  use  an  optimization  model 
and  algorithm  to  help  minimize  pmap . 


B.  A  LINEAR  FORMULATION  OF  THE  MATHEMATICAL  MODEL 


From  Equation  4.1,  the  denominator  is  constant  for  all  /  and  m.  The  most 
important  features  of  the  equation  are  in  the  numerator.  Therefore,  minimizing  p  is 


equivalent  to  minimizing 


map 


=  max 

htm 


(4.2) 
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Our  task  to  minimize  v  is  constrained  by  the  requirement  that  each  column  of  X  is  a 

pennutation  of  the  first  n  natural  numbers.  Difficulties  in  fonnulation  arise  in  the 
nonlinearity  and  nondifferentiability  of  the  absolute -value  function.  To  construct  an 
OLH  or  NOLH  with  dimensions  n  and  k  using  optimization  techniques,  we  develop  a 
linear  mathematical  model  for  Equation  4.2. 

1.  An  Initial  Mathematical  Model 

We  follow  the  Naval  Postgraduate  School  Operations  Research  format  (Dell, 
2004)  for  presenting  an  optimization  problem.  We  annotate  the  objective  and  constraint 
equations  in  these  fonnulations  for  reference.  Our  initial  formulation  of  the 
problem  follows. 

Model  1  ( n ,  k): 

INDICES 

z  runs  (alias j )7  i=\,...,n 

l  factors  (alias  m)  l=\,...,k 

VARIABLES 

X!  level  of  /th  factor  on  the  z'th  run 

V  variable  for  vmap 

FORMULATION 


min  V 

(AO) 

Y1 

r— t- 

"n 

IV 

TT 

-H 

i 

JS 

'  3 

1 

V/  ^  m 

(Al) 

v>-£(x!-x)(x--x) 

i= 1 

V/  ^  m 

(A2) 

V/  *j,l 

(A3) 

\<X\<  n,  integer 

Vz,/ 

(A4) 

The  objective  function  AO  is  simply  V;  our  constraints  force  V  to  assume  the  value 
v  at  our  approved  solution.  Because  the  absolute  value  is  a  nonlinear  expression, 

7  The  alias  of  an  index,  i,  allows  the  programmer  to  use  a  different  index,  /.  to  refer  to  the  same  set 
of  numbers. 
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constraints  A1  and  A2  separate  it  into  terms  that  we  can  more  readily  transform  into 
linear  tenns.  Constraints  A3  and  A4  require  that  each  column  of  X  be  a  pennutation  of 
the  integers  from  1  to  n.  Constraints  A1  and  A2  are  nonlinear,  and  constraints  A3  and  A4 
are  nonconvex. 

2.  Transformation  of  a  Nonlinear  Optimization  Problem 

We  reformulate  our  mathematical  model  with  the  goal  of  converting  it  into  a 
linear  optimization  problem.  The  advantage  of  linear  functions  is  that  they  are  proven  to 
be  convex  and  thus  have  one  or  more  extreme  points  that  correspond  to  maximum  or 
minimum  objective  values  (Bazaraa  et  ah,  2005). 

Optimization  frequently  centers  on  solving  convex  problems,  of  which  linear 
types  are  most  easily  detennined.  The  general  problem  of  convex  optimization  is  to  find 
the  minimum  of  a  convex  or  quasiconvex  function  on  a  finite-dimensional  convex  space, 
specified  by  a  set  of  extreme  points  and  extreme  rays  or  vectors  (Bazaraa  et  ah,  2005). 
As  so  defined,  a  maximum  or  minimum  value  of  a  convex  function  may  be  found  by 
systematically  examining  extreme  points  from  the  convex  space  (Bertsimas  &  Tsitsiklis, 
1997:  Nash  &  Sofer,  1998). 

The  constraints  of  the  mathematical  model  define  the  feasible  region  of  the 
objective  function.  A  set  of  convex  or  quasiconvex  constraints  result  in  a  convex  or 
quasiconvex  feasible  region,  which  contains  extreme  points  and  an  optimal  solution. 
Reciprocally,  a  feasible  region  resulting  from  one  or  more  nonconvex  constraints  results 
in  a  nonconvex  feasible  region,  which  cannot  guarantee  an  optimal  solution. 
Unfortunately,  we  find  the  design  of  experiment  problem  to  be  nonlinear  and  nonconvex. 

Our  first  step  to  linearize  the  problem  is  to  refonnulate  A3  by  introducing  binary 
variables.  We  represent  one  run  (design  point)  i  of  one  factor  /  with  a  set  of  n  binary 
variables,  one  for  each  possible  level  j  of  the  factor  on  that  run.  For  a  binary  variable, 
Y.j ,  if  the  factor  /  is  set  to  the  j,h  level  in  the  ith  run,  we  let  the  variable  equal  one,  and 

zero  otherwise.  Therefore,  K47  =  1  means  that  for  the  4th  factor,  the  level  is  set  at  7  in  the 

3ld  run  of  the  experiment.  The  addition  of  these  variables  requires  new  constraints  to 
control  the  values  that  they  may  assume.  Our  revised  model  follows. 
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Model  2  (n,  k ): 

VARIABLES 

Y!.  equals  1  (0  otherwise)  if  factor  /  is  set  to  level  j  in  run  i 

V  variable  for  vmap 

FORMULATION 

min  V  (BO) 

s.t.  V  (Bl) 

'= 1  V  i= 1  J  V  )= 1  J 

Vl,in;l*m  (B2) 

,=i  vt=i  Am  J 

=  i  v/,;  (B3) 

Zru=i  V,l  (B4) 

Z=1 

^•£{0,1}  Vi,j,l  (B5) 

Constraints  Bl  and  B2  are  reformulations  of  A1  and  A2.  Constraints  for  B3 
require  that  exactly  one  level  be  chosen  for  each  factor,  for  each  run.  Constraints  B4 
ensure  that  each  level  is  chosen  exactly  one  time  for  each  factor.  These  constraints  are 
linear  since  they  are  nothing  more  than  a  summation  of  simple  variables.  However, 
constraints  B 1  and  B2  are  still  nonlinear  because  they  involve  the  product  of  K.f  and  Y”\ . 

3.  A  Complete  Linear  Formulation  of  the  Optimization  Problem 

There  are  several  ways  to  deal  with  the  nonlinearity  in  Bl  and  B2.  One  is  to 
develop  derivative-free  heuristic  algorithms  (including  the  many  varieties  of  local  search) 
that  generate  many  design  matrices,  and  evaluate  V  for  each  one.  Evaluating  V  is  much 
easier  than  optimizing  it,  but  we  recognize  that  heuristics  do  not  guarantee  an  optimal 
solution,  although  they  sometimes  result  in  one. 

Our  approach  to  the  difficulties  associated  with  constraints  Bl  and  B2  is  to 
optimize  one  column  (factor)  of  the  design  matrix  at  a  time.  We  find  level  settings  for  a 
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given  factor,  m,  that  minimizes  vmap  between  m  and  every  other  (fixed)  column  in  the 

design.  Therefore,  variables  corresponding  to  factor  m  are  the  only  ones  that  appear  in 
the  model,  and  we  treat  all  other  factors  in  the  design  as  having  “fixed”  values,  denoted 
X\ .  The  resulting  levels  for  the  targeted  factor  m  guarantees  the  lowest  value  for  v  , 

given  that  all  other  factor  levels  remain  constant.  This  final  formulation  is  a  single¬ 
column  optimization  program  for  a  specified  column,  m,  in  a  design  with  n  runs  and  a 
total  of  k  factors. 

Model  3  (m,  k  ,n ): 

DATA 


Xt  level  of  factor  /  for  run  i  in  design  X 

FORMULATION 

min  V 


s.t.  i  ±(x‘,-4±jr< 

i= 1  V  J= 1 

n  /  .  \fn 

VZ-^Xl-x)  YjY; 


- X 


j  X 


i=l 


\J=l 


n 

If =i 

j= i 

n 

If =i 

f  fO.1} 


V/  ^  m 

V/  ^  /« 

V/,/ 

v/,/ 

Vi,  7,/ 


(CO) 

(Cl) 

(C2) 

(C3) 

(C4) 

(C5) 


Constraints  C 1  and  C2  are  now  linear  in  Ym .  Several  of  the  terms 
constraints  are  constant,  and  can  be  further  simplified: 


n  /  *  \  n 

=i(f-di  f 

i=l  7=1 


in  these 


(4.3) 
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This  simplification  yields  our  final  model. 

Final  model  3  (m,  n,  k ): 

INDICES 

i  runs  (alias j)  i=\,... ,n 

l  factors 

DATA 

X\  level  value  of  factor  /  for  run  i  in  the  given  design  X 

VARIABLES 

Y”1.  equals  1  (0  otherwise)  if  factor  m  is  set  to  level  j  in  run  i 

V  maximum  absolute  value  of  v  for  any  two  columns  in  X 


FORMULATION 

min  V 

(C0) 

CZ5 

r- F 

"7 

IV 

iM: 

jN> 

£  "J 

XT 

1 

V/  ^  m 

(Cl) 

n-±(xi 

i= 1 

j= i 

V/  ^  m 

(C2) 

2X = i 

j= i 

Vi 

(C3) 

ii 

£  "J 

v j 

(C4) 

Y'e{  0,1} 

v ij 

(C5) 

For  the  remainder  of  this  document,  we  refer  to  the  optimization  program  based 
on  the  final  model  as  VMIN:  a  minimization  model  for  the  variable  V. 

C.  OPERATIONALIZING  THE  LINEAR  OPTIMIZATION  PROBLEM 

This  final  fonnulization  is  key  to  our  search  for  new  designs  with  orthogonal  or 
nearly  orthogonal  properties.  We  encode  VMIN  into  GAMS  (General  Algebraic 
Modeling  System,  2008),  a  powerful  high-level  modeling  system  for  mathematical 
programming  and  optimization,  and  solve  using  the  commercially  available  solver  Cplex 
(2008).  GAMS  consists  of  a  language  compiler  and  a  suite  of  integrated  high- 
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performance  solvers  that  are  specifically  designed  for  modeling  large  linear,  nonlinear, 
and  mixed  integer  optimization  problems.  The  result  is  a  robust  programming  language 
that  can  solve  problems  with  thousands  of  linear  constraints  and  thousands  of  variables. 

1.  The  Size  and  Complexity  of  VMIN 

Because  the  complexity  of  even  relatively  small  problems  has  the  potential  of 
taxing  the  model  to  a  point  that  prevents  it  from  converging  to  a  solution,  we  deliberately 
choose  to  focus  on  one  factor  column  to  solve.  Equations  from  C3  in  VMIN  correspond 
to  a  total  of  n  equations — one  for  each  run  in  the  design.  Because  m  is  the  targeted 
column,  it  contains  the  only  set  of  variables  in  the  problem — one  for  each  level  of  the 
factor  m.  We  illustrate  with  a  linear  equation  for  C3,  where  the  targeted  (shaded)  column 
is  m  =  3  and  run  number  is  i  =  2.  If  we  set  n  =  5  runs,  there  will  be  five  value  levels, 
which  the  variables  7” ,  j  =  1,  2,  3,  4,  5  represent  in  the  shaded  column  of 
Table  13.  Column  (k#)  and  row  (n#)  labels  in  the  table  are  for  ease  of  reference. 

Table  13.  We  present  how  the  parameters  of  a  design  and  the  design’s  initial  values 
translate  into  the  constraints  in  VMIN  to  optimize  the  values  of  the  targeted 
variable  column  ( m  =  3,  k  =  4,  n  =  5),  where  value  levels  for  column  3  remain  as 
variables  and  all  other  values  are  fixed. 


kl 

k2 

k3 

k4 

nl 

3 

4 

7 

4 

n2 

4 

2 

7 

5 

n3 

1 

3 

73 

2 

n4 

2 

5 

73 

7  4  J 

1 

n5 

5 

1 

73 

7  5  J 

3 

The  constraint  for  this  instance  of  C3  is  72’,  +  72\  +  7’,  +  7234  +  7235  =  1 .  We  see  in 

this  equation  that  only  one  of  the  variables  may  equal  one — that  is,  only  one  value  level 
from  factor  3  can  be  assigned  to  the  second  run  of  the  experiment. 
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The  following  discussion  explains  the  constraints  involved  in  the  linear 
formulization  of  the  problem.  For  constraints  Cl  and  C2,  we  consider 

x  =  U  +  '  =  ^  +  '  =  3  and  the  fixed  values  for  columns  kl,  k2,  and  k4  of  the  initial  design 
2  2 

to  construct  the  following  pseudo  expression  for  v  . 


>•=1  j= 1 

V  >  (3  -3)(1^  +  2 y;2  +  3^3  +  4 Y;4  +  5Y;5)  +  (4  -  3)(1^  +  2l£  +  3l£  +  4 +  5^)  + 

(1  -  3)(1^  +2Y^+  3^  +  4^4  +  51*  )  +  (2  -  3)(1^  +  2Y^  +  3^  +  4Y^  +  5iJ )  + 
(5  -  3)(1^  +  2Y^  +  31*  +  41*  +  51*  ) 

V>  (4- 3)(1T14  +  2Y^_  +  3T,3  +  4Tj34  +  5^)  +  (2 -3)(1^  +  21*  +  3^  +  4^34  +  51*  )  + 
(3  -  3)(11*  +  2^32  +  31*  +  4Tj34  +  51*  )  +  (5  -  3)(1^3  +  2i;32  +  31*  +  4i;34  +  57^)  + 
(1-3)(1^+2^2+3^3+4^4+5^5) 

V  >  (4  -  3)(1T13  +  2Y^\  +  3^f3  +  4Tj34  +  5f^35)  +  (5  -3)(1^3  +  21*  +  3Y^  +  41*  +  5Y]3.)  + 

(2  -  3)(1^3  +  2Y32  +  31*  +  4T,34  +  51* )  +  (1  -  3)(1^3  +  2l£  +  3l£  +  4l£  +  51*  )  + 
(3  -  3)(11^^  +  2T,32  +  3T,33  +  4T,34  +  5T,35 ) 


V  > 


V  > 


V  > 


V  > 


i= 1  j= 1 

"(3  -  3X1T,3  +  21*  +  31*  +  41*  +  51*  )  +  (4  -  3)(1X3  +  2  1*  +  3  +  4  Y*4  +  51*)  +" 

-  (1  - 3)(1X3  +  21*  +  3l£  +  41*  +  51*  )  +  (2  - 3)(lX3i  +  21*  +  31*  +  41*  +5^)  + 
v(5-3)(lX3  +  21*  +  31*  +4X34  +5X3s) 

"(4  -  3)(1X3  +  2X32  +  31*  +  41*  +  51* )  +  (2  -  3)(1X3  +  21*  +  3  ^  +  47^  +  51*  )  +" 

-  (3  -  3)(1X3  +  2X32  +  3X33  +  4T,34  +  5X35)  +  (5  -  3)(1X3  +  2Y32  +  31*  +  41*  +  5^)  + 
v(1-3)(1T13+2X32+3^3  +4X34  +5X35) 

"(4  -  3)(1X3  +  2X32  +  31*  +  41*  +5X3s)  +  (5  -  3)0^  +  2  1*  +  3  1*  +  4  Y^  +  51*)  +N 
~  (2  -  3)(1X3  +  2l£  +  3^  +  41*  +  5^5 )  +  (1  -  3)(1X3  +  21*  +  3 1*  +  41*  +  51* )  + 
v(3  -  3)(1713  +  2T132  +  3^  +  47j34  +  51*  ) 


Constraints  for  C3  ensure  that  each  run  has  exactly  one  value  level  assigned. 
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Zi""=l  v,- 

j= 1 

y3  y3  +  y3  +  y3  +  73  =  1 

Ji,iTJi,2TJyT1i,4TJU 

y3  +  y3  +  y3  +  y3  +  y3  =  1 

■‘2,1  T  12,2  T  J2,3  T  ^2,4  T  J2,5  1 

y3  +y3  +y3  +y3  +y3  =  i 

J3,l  T  ^3,2  T  ■‘3,3  T  ‘3,4  T  ■t3,5  1 

y3  +  Y3  +  y3  +  y3  +  y3  =  1 

■'4,1  T ‘4,2  T  ■‘4,3  T  ■'4,4  T  ■‘4,5  1 

y3  +  y3  +Y3  +Y3  +Y3  =  1 

■'5,1  TJ5,2  1 

Constraints  for  C4  ensure  that  each  value  level  is  assigned  exactly  one  time. 

lX;=i  y/ 

i= 1 

y3  +  y3  +  y3  +  y3  +  y3  =  1 
■'u  T  ■‘2,1  T  ■'3,1 T  ■'4,1  T  -'5,1  1 

y3  +  y3  +  y3  +  y3  +y3  =i 

■'l^  ^■'2,2  t-£4,2  1 

y3  +  y3  +  y3  +  y3  +y3  =i 

j1,3tj2,3tj33t‘43t'5,3  1 

y3  +  y3  +  y3  +  y3  +  y3  =i 
‘1,4  T  ■'2,4  T  ■'3,4  T  ■'4,4  T  ■'s^  1 

y3  +  y3  +  y3  +  y3  +  y3  =i 

■'1,5  T  J2,5  T  ■'3,5  T  ‘4,5  T  15,5  1 

Finally,  constraints  for  C5  ensure  that  each  binary  variable  assumes  only  the  value  of  1  or 
0.  It  requires  a  set  of  two  constraints  for  each  variable, 
ysjo.i}  vij 
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VMIN  constraints  become  more  complex  as  n  and  k  increase.  There  are  25  binary 
variables,  1  continuous  (objective)  variable,  and  66  constraints  for  this  relatively  small 
design.  The  size  of  the  problem  increases  quadratically  in  the  dimensions.  Consider  a 
problem  n  =  33  and  k  =  11.  The  problem  now  involves  nearly  1,100  equations  and 
almost  1,000  binary  variables  to  detennine  an  optimal  arrangement  of  value  levels  for 
just  one  factor,  m.  The  solution  is  a  subset  of  variables  of  size  n,  each  equaling  one,  and 
the  remaining  n(n  -  1)  variables  being  zero. 

2.  Applying  RLH  and  Florian’s  Method  to  Initiate  VMIN 

Random  Latin  hypercube  (RLH)  generation  and  Florian’s  correlation  reduction 
method  provide  an  excellent  initial  solution  to  VMIN.  Preliminary  computations  to 
support  an  optimization  process  are  categorized  as  heuristic  algorithms — methods  to 
reach  suboptimal  solutions  (Bertsimas  &  Tsitsiklis,  1997). 

A  good  initial  solution  aids  the  optimization  routine  to  solve  the  problem  more 
quickly  and  with  a  greater  chance  of  reaching  a  low  vmap .  In  our  process,  we  obtain  a 

good  initial  solution  by  quickly  generating  many  RLHs — through  k  random  pennutations 
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of  the  first  n  natural  numbers — and  selecting  the  one  with  minimum  p  .  Then,  we 

iteratively  apply  Florian’s  (1992)  proven  correlation  reduction  method  to  create  an  FRLH 
that  serves  as  the  starting  solution  for  VMIN.  In  practice,  for  designs  of  the  scale  in  this 
paper  (or  a  little  larger),  it  takes  only  seconds  on  a  modem  desktop  processor  to  generate 
hundreds,  or  even  thousands,  of  RLHs  for  designs  with  scores  of  variables.  The  FRLH 
improvement  on  the  best  RLH  usually  takes  a  few  minutes.  It  typically  takes  on  the  order 
of  a  few  hours  to  generate  the  final  design — which  likely  requires  several  iterations  of 
VMIN.  Additional  columns  can  be  added  by  appending  a  new  random  column  and  then 
iteratively  applying  VMIN.  For  the  size  of  the  experiments  we  discuss,  it  usually  takes 
on  the  order  of  tens  of  minutes  to  optimize  the  new  column. 

Table  14  shows  a  design  with  n  =  17  runs  and  k  =  7  factors.  The  associated 
objective  value,  V,  for  the  initial  design  corresponds  to  p  =  0.066  .  Table  contents  are 

the  actual  levels  for  the  column  factor  for  each  run.  There  are  17  =  289  variables  for 
each  specific  factor,  m,  in  this  problem:  Yfl,Yf1,...,Y*2A,Y*25,...,Yfll6,Yf111 .  Table  15 

illustrates  the  initial  binary  value  for  Y. '264  =  1  — meaning  that  the  actual  level  value  for 
factor  6  on  the  2nd  run  is  4,  and  reciprocally  Tjf  =  0  for  all  j  ^  4 . 
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Table  14.  This  n  =  \l,k=l  design,  p  =  0.066  ,  is  a  result  from  generating 

numerous  RLHs,  selecting  the  design  with  the  least  nonorthogonality,  and 
applying  Florian’s  method  iteratively.  It  is  the  initial  design  before  using  VMIN. 


kl 

k2 

k3 

k4 

k5 

k6 

k7 

nl 

15 

2 

7 

5 

8 

9 

15 

n2 

17 

15 

11 

11 

6 

4 

14 

n3 

5 

7 

8 

14 

2 

16 

4 

n4 

13 

10 

1 

9 

15 

13 

5 

n5 

11 

6 

17 

3 

12 

15 

1 

n6 

3 

14 

14 

13 

10 

6 

10 

n7 

7 

16 

2 

6 

11 

14 

12 

n8 

8 

9 

13 

1 

16 

7 

8 

n9 

1 

11 

5 

2 

7 

8 

6 

nlO 

14 

13 

3 

15 

5 

11 

3 

nil 

10 

5 

12 

10 

4 

1 

2 

nl2 

2 

1 

4 

8 

3 

2 

11 

nl3 

4 

3 

6 

16 

17 

12 

16 

nl4 

6 

12 

16 

17 

13 

10 

9 

nl5 

9 

17 

10 

4 

9 

3 

13 

nl6 

16 

4 

9 

12 

14 

5 

7 

nl7 

12 

8 

15 

7 

1 

17 

17 

For  a  more  expansive  look  into  this  example,  we  continue  to  focus  on  column 
m  =  6  of  the  initial  design.  The  initial  Y™  solution  set  for  m  =  6  is  more  easily  seen  in 

Table  15  as  a  two-dimensional  display  of  variable  values  that  show  exactly  one  factor 
level  (/)  assigned  to  each  run  (n).  This  matrix  of  zeroes  and  ones  corresponds  to  the 
values  in  column  vector  (k6)  from  Table  14. 
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Table  15.  Binary  variable  values  for  factor  m  =  6  in  the  initial  design  show  the 
sparseness  of  this  single  column  optimization  problem.  A  set  of  these  variables 

corresponds  to  each  factor. 


m  =  6 

jl 

j2 

j3 

j4 

j5 

j6 

j7 

j8 

j9 

jio 

jll 

j!2 

j!3 

j!4 

j!5 

j!6 

j!7 

nl 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

n2 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

n3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

n4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

n5 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

n6 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nl 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

n8 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

n9 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nlO 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

nil 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nl2 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nl3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

nl4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

nl5 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nl6 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

nl7 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

We  apply  VMIN  to  target  factor  m  =  6  to  improve  V.  In  the  following  paragraphs 
we  discuss  the  root  mean  square  (Owen,  1994)  to  explain  the  rationale  for  selecting  m  =  6 
as  the  column  to  optimize.  VMIN  focuses  on  the  problem  of  populating  the  above  matrix 
with  ones  in  the  appropriate  cells,  and  zero  elsewhere.  VMIN  considers  each  of  the  289 
variables  in  the  problem,  and  through  the  Cplex  solver,  detennines  a  solution  (i.e.,  an 
arrangement  of  values  of  “l”s  in  the  matrix)  that  minimizes  V  to  within  a  user-specified 
tolerance  of  the  optimal  solution. 

Tolerance  is  the  difference  between  the  notional  optimal  value  of  the  objective 
function  and  the  actual  computed  value,  divided  by  the  notional  value,  and  multiplied  by 
100%.  Common  practice  sets  tolerance  at  10%,  but  we  use  1%  in  VMIN  to  approach 
orthogonality.  We  adjust  tolerance  as  design  dimensions  near  saturation. 
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3.  Iterative  Application  of  VMIN 

Iterative  application  of  VMIN  constructs  the  entire  design.  Solving  one  instance 
of  VMIN  guarantees  an  optimal  solution  for  the  current  factor  of  interest.  However,  our 
simplification  of  the  optimization  problem  to  one  column  leaves  us  with  a  nonoptimal 
solution  for  the  overall  DOE  problem.  To  reach  orthogonality  or  near  orthogonality 
requires  a  deliberate  process.  Therefore,  after  determining  the  best  value  levels  for  one 
column,  we  select  a  new  column  to  optimize.  We  repeat  this  process  until  we  have  an 
NOLH,  or  there  is  no  further  progress. 

Selecting  the  next  column  to  optimize  can  become  complex.  There  are  other 
methods  for  selecting  the  next  column  to  optimize,  such  as  selecting  a  column  that  is 
associated  with  pma  ,  but  this  value  exists  for  at  least  one  other  column.  It  is  possible 

that  even  more  than  two  columns  are  associated  with  p  .  Instead,  following  Owen 

(1994),  we  use  root  mean  square  (rms)  as  the  measure  for  selecting  the  next  column  to 
optimize.  Computing  rms  for  a  column  m  involves  summing  the  squared  correlation 
coefficients  between  m  and  all  other  columns  in  the  design,  and  dividing  the  total  with  the 
number  of  unique  column  pairs,  which  yields 

where  pIm  is  the  correlation  coefficient  in  (1.1),  and  k  is  the  number  of  columns  in  the 

design.  We  select  the  column  with  the  largest  rms  as  the  one  to  optimize.  The  greatest 
rms  identifies  the  column  that  has,  on  average,  the  largest  squared  correlation  with  any 
other  column,  and  is  therefore  the  most  problematic,  relative  to  all  other  columns. 
Selecting  this  column  for  optimization  increases  the  chances  for  reducing  the  design’s 
overall  nonorthogonality.  In  practice,  this  approach  has  proven  straightforward,  tractable, 
and  successful.  Although  iterative  column  optimization  may  not  terminate  at  a  solution 
that  equates  to  the  lowest  possible  p  ,  it  provides  a  near  optimal  solution  that  has  a 

very  good  chance  of  meeting  NOLH  criteria. 

Numerous  difficulties  emerge  when  using  optimization  to  solve  the  complete 

DOE  problem.  It  is  a  primary  reason  that  scientists  have  sought  other  methods  for 
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developing  experimentation  schemes.  Our  presentation  of  a  solution  for  m  =  6  in 
Table  15  offers  a  glimpse  of  the  issues  that  an  optimization  approach  faces.  The  two- 
dimensional  matrix  presents  the  VMIN  attempt  to  position  “l”s  where  no  two  entries  are 
in  the  same  row  or  column.  Taking  this  same  problem  vertically  for  each  factor  further 
complicates  the  arrangement  of  “l”s.  A  new  restriction  that  no  two-factor  (in)  matrices 

should  have  the  same  run  (n)  and  level  (j)  cell  filled  complicates  the  problem  sufficiently 

2 

to  challenge  the  most  robust  optimization  schemes.  Such  a  problem  requires  m  sets  of  n 
variables  moving  simultaneously.  The  resulting  problem  is  nonlinear  and  nonconvex. 

Restricting  the  optimization  problem  to  one  factor  at  a  time  allows  us  to  transform 
it  into  a  linear  one.  Our  formulation  offers  a  means  for  reaching  a  focused  solution  that 
guarantees  an  optimal  solution  for  any  given  column,  m.  The  method  that  we  employ  to 
generate  an  initial  solution  is  grounded  in  optimization  theory  and  practical  experience. 
Our  iterative  application  of  VMIN  methodically  reduces  p  to  an  acceptable  level. 

Short  of  optimality,  creation  of  new  designs  that  meet  NOLH  criteria  satisfies  our  study 
goals.  This  method  breaks  free  from  previous  constraints  to  produce  NOLHs  in  design 
dimensions  that  other  approaches  do  not. 

D.  NEW  DESIGNS  FROM  VMIN 

This  section  describes  the  full  impact  of  combining  FRLH  and  optimization  into  a 
new  methodology.  Their  synthesis  produces  new  NOLH  designs.  We  quantify  our 
method’s  effectiveness  and  present  excerpts  of  new  NOLH  designs  that  are  not  possible 
to  construct  using  earlier  techniques.  The  relative  ease  and  flexibility  of  this  technique 
make  it  an  attractive  option  for  constructing  new  and  efficient  computer  experiments. 

As  we  previously  stated,  small  NOLH  designs  (n  <  50)  are  difficult  to  generate  if 

fl 

k>  — .  Cioppa’s  (2002)  method  generates  millions  of  LHs  before  selecting  one  that 
warrants  application  of  Florian’s  method.  In  practice,  Florian’s  method  successfully 
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Tl 

produces  NOLH  designs  for  n  >  50  and  k  <  —  with  a  single  RLH  generation,8  but 

cannot  guarantee  an  NOLH  for  dimensions  that  violate  these  combinatorial  conditions. 

Our  optimization  program  is  a  complement  to  FRLH.  It  not  only  provides  a 

Tl 

means  to  create  NOLH  designs  with  dimensions  n  >  50 ,  —<k<n,  it  covers  design 

dimensions  for  any  n,  and  k  — >  n  .  Designs  in  Appendix  A  demonstrate  our  technique’s 
utility. 

Cioppa  and  Lucas  (2007)  discuss  a  number  of  OLH  and  NOLH  designs  and 
compare  their  orthogonal  properties  with  the  average  correlation  measure  of  simple  LHs. 
An  excerpt  from  their  article  shown  in  Table  16  shows  their  dimensions  and 
corresponding  orthogonal  characteristics. 

Table  16.  An  excerpt  from  Cioppa  and  Lucas  (2007)  compares  RLHs  and  Cioppa’s 
(2002)  OLH  and  NOLH  designs  with  respect  to  their  orthogonal  characteristics. 


Design 

Max  Pairwise 
Correlation 

O33 

'A  i 

0 

Best  Nil 

0.0234 

Mean  R,33 

0.4015 

O65 

A  6 

0 

Best  A,665 

0.0219 

Mean 

0.3194 

o 129 
^22 

0 

Best  N™ 

0.0015 

Mean  R\f 

0.2332 

Gaps  in  the  current  OLH  and  NOLH  library  exist  because  of  the  rigidity  of 
dimensional  constraints.  To  achieve  orthogonality,  Cioppa’s  existing  convention  for 
design  dimensions  requires  an  increase  in  the  number  of  runs  from  33  to  65  when  the 

8  In  practice,  one  RLH  generation  is  nearly  always  sufficient  to  produce  a  NOLH  using  Florian’s 
method.  However,  choosing  the  RLH  with  the  least  maximum  absolute  pairwise  correlation,  from  up  to 
1,000  RLHs,  helps  increase  the  chances  of  producing  an  NOLH. 
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number  of  factors  increases  from  1 1  to  12.  In  this  example,  there  is  a  breach  in  the  OLH 
and  NOLH  library  for  designs  that  have  less  than  17  runs  and  1  <  k  <  7 ,  or  for  designs 
1 8  <  n  <  32  and  8  <  k  <  1 1 . 

Efforts  to  produce  smaller  designs  benefit  from  starting  with  a  “good”  initial 
design  (Tang,  1998).  We  consider  a  starting  design  for  an  8  x  3  RLH,  which  violates  our 

g 

rule  of  thumb;  n .  =  8  <  50  and  k  =  3  >  —  =  2.67  .  There  are  no  known  catalogued  OLH  or 

NOLH  designs  with  these  dimensions.  We  generate  an  LRLH  design,  which  has  a  p 

equal  to  0.0476,  shown  on  the  left  side  of  Table  17  Although  the  LRLH  meets  Ang’s 
near-orthogonality  criteria,  we  employ  VMIN.  We  use  the  LRLH  design  as  a  heuristic 
solution  to  initiate  VMIN.  VMIN  optimizes  the  third  column  resulting  in  a  fully 
orthogonal  design,  038  :  p  =  0  and  a  condition  number  of  1.0.  We  add  the  final  design 
on  the  right-hand  side  of  Table  17  to  the  OLH  library. 

Table  17.  Results  from  a  small  design  after  iterative  application  of  Llorian’s  method 
on  the  left-hand  is  not  an  OLH  design.  It  can  be  used  as  the  initial  design  for 
VMIN.  The  final  design  is  orthogonal,  as  shown  on  the  right-hand  design. 


We  summarize  our  methodology  to  construct  new  designs  in  the  following  steps 
and  present  it  graphically  in  Ligure  9. 

Step  1:  Consider  the  dimensions,  n  and  k,  for  the  desired  design. 

Step  2:  Detennine  if  OLH  or  NOLH  designs  exist  for  desired  dimension: 
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If  OLHs  or  NOLHs  with  the  same  design  dimensions  are  available,  use  them.  As  an 
alternative,  determine  if  it  is  possible  to  construct  a  new  design  based  on 
previous  methods. 

If  the  designs  do  not  exist  in  the  library,  or  design  dimensions  do  not  meet  OLH  and 
NOLH  conventions,  go  to  Step  3. 

Step  3:  Determine  if  it  is  necessary  to  apply  the  combined  technique  or  only 
Florian’s  method. 

Yl 

If  n  >  50  and  k  <  — ,  generate  one  or  more  RLHs  and  iteratively  apply  Florian’s  method. 

Upon  tennination,  the  resulting  design  will  likely  possess  NOLH  properties — if  not,  try  a 
few  more.  If  unsuccessful  after  a  few  iterations,  go  to  Step  6. 

n 

If  n  >  50  and  —<k<n,  or  n  <  50 ,  go  to  Step  4. 

Step  4:  Generate  up  to  1,000  RLHs  and  select  the  design  with  the  minimum  p  and 
designate  it  Xinitial. 

Step  5:  Iteratively  apply  Florian’s  method  on  Xinitial  until  pmap  does  not  improve. 
Designate  this  improved  design  as  XH  ;  the  result  of  a  heuristic  approach. 

Step  6:  With  XH  as  a  start  point,  implement  VMIN  iteratively. 

Step  7:  Record  the  resulting  design.  If  the  design  is  unsatisfactory,  return  to  Step  4  and 
repeat  with  new  RLHs. 

A  departure  from  previous  techniques,  we  find  that  our  methodology  usually 
uncovers  more  than  one  design  from  which  to  choose.  Time  permitting,  an  experimenter 
can  generate  a  number  of  candidate  NOLHs,  and  then  use  criteria  other  than  p  ,  such 

as  space-filling  properties,  to  select  the  NOLH  that  best  meets  the  experiment’s 
requirements.  Some  methodologies  are  deterministic  in  their  final  designs,  such  as 
Steinberg  and  Lin  (2006)  and  Ang  (2006),  leaving  the  experimenter  with  only  a  single 
choice  at  their  limits.  The  opportunity  to  quickly  produce  different  NOLH  designs,  with 
few  dimensional  constraints,  is  an  important  strength  of  our  method.  Figure  9  outlines 
the  steps  to  achieve  these  designs. 
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Step  1 : 


Step  2: 


Step  3: 


Step  4: 


Step  5: 


Figure  9.  This  flow  chart  presents  the  combined  process  for  matrix  transformation  and 
optimization  of  an  MIP  to  construct  an  OLH  or  NOLH  design. 

We  demonstrate  our  methodology  with  a  design  for  14  runs  and  7  factors.  The 
resulting  designs  are  in  Table  18. 


Step  1:  The  desired  design  has  dimensions  n  =  14  and  k=  7. 

Step  2:  These  dimensions  do  not  conform  to  OLH  or  NOLH  convention,  which 
prompts  the  experimenter  to  go  to  Step  3. 

k  7 

Step  3:  Because  «  =  14<50  and  k  =  7  =>  —  =  —  =  0.50  >0.33,  Florian’s  method 

n  14 

alone  will  not  likely  result  in  an  NOLH  design.  Go  to  Step  4. 

Step  4:  Using  scripts  that  we  developed  in  R  software,  we  generate  1,000  RLHs  and 
select  the  design  with  the  minimum  p  and  designate  it  Xinitial.  The  resulting 

design  has  pmap  equal  to  0.367  and  condition  number  4.489. 

Step  5:  Iteratively  applying  Florian’s  method  on  Xjnitial  results  in  a  design  with  a 
p  of  0.099.  We  designate  this  heuristic  as  XH  and  present  it  on  the  left-hand  side 
of  Table  18. 
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Step  6:  Iterative  implementation  of  VMIN  on  XH  results  in  an  NOLH  design  with 
p  of  0.033  and  a  condition  number  of  1.134.  We  show  the  final  design  on  the 
right-hand  side  of  Table  18. 

Step  7:  We  record  this  NOLH,  N]a ,  as  an  entry  into  the  NOLH  library. 

Table  18.  The  result  from  iterative  application  of  Florian’s  method  on  a  randomly 
generated  n  =  14,  &  =  7  design  is  the  initial  design  for  VMIN,  as  shown  on  the 
left-hand  side  of  the  table.  It  does  not  meet  NOLH  orthogonality  criteria.  The 
final  design  for  n  =  14,  &  =  7  after  VMIN  is  a  new  NOLH  design,  presented  on  the 

right-hand  side. 
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Final  Nearly  Orthogonal  Design: 

pmap  =0.033,  condition(X'X)  =1.13 
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We  reiterate  that  this  process  is  readily  repeatable — using  new  random  number 
streams  to  yield  a  new  starting  RLH — and  develop  many  different  NOLHs  with  our 
required  dimensions,  perhaps  using  other  criteria,  such  as  space-filling  properties  or 
consideration  for  higher  order  terms  (interactions,  quadratic,  etc.)  to  select  the 
best  design. 

An  extension  of  this  example  demonstrates  the  power  and  flexibility  of  our 
combined  methodologies.  To  develop  N™  we  use  a  new  random  stream  to  build  a  new 
n  =  14,  k  =  10  RLH  design,  and  iteratively  apply  Florian’s  method  to  produce  a  heuristic 
solution  to  initiate  VMIN.  An  V,'f4  design  was  possible  with  this  method.  Adding  one 
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new  column  at  a  time  from  a  random  permutation  of  the  value  levels  and  applying 
VMIN,  we  construct  the  nearly  saturated  NOLH  design,  Ay,4 ,  shown  in  Table  19. 


Table  19.  The  N\t  developed  by  RLH,  FRLH,  and  VMIN  emphasizes  the  flexibility 

of  our  methodology  to  construct  any  number  of  different  NOLH  designs  (to 
include  saturated  ones).  Note  that  none  of  the  columns  of  this  design  are  identical 
to  the  design  on  the  right-hand  side  of  Table  19.  It  shows  that  this  new  design  is 
not  merely  an  extension  of  the  n  =  14,  k  =  7  design,  although  our  technique  can 

certainly  extend  smaller  designs. 


Nearly  Saturated  NOLH  Design: 
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Certainly  another  approach  would  be  to  use  the  initial  n  =  1 4,  k  =  7  design  from 
Table  18  and  add  one  column  at  a  time,  but  starting  at  this  low  number  of  variables  is 
cumbersome.  We  chose  to  be  aggressive  and  began  with  an  k:n  ratio  equal  to  two-thirds, 
violating  our  “one-third”  rule  of  thumb.  The  flexibility  of  saturated  designs  permits 
experimenters  to  take  any  subset  of  columns  to  develop  a  new  design,  when  for  example, 
more  degrees  of  freedom  for  error  in  a  linear  fit  is  desirable.  We  complete  our 
development  of  this  design  with  a  final  application  of  Florian’s  method  to  determine  if  an 
improvement  is  possible.  There  is  no  improvement  in  p  and  we  enter  N\t  into  the 
NOLH  catalogue. 
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Our  method  requires  relatively  little  time  and  resources.  We  construct  the  design 
using  R  software  and  GAMS.  Generating  1,000  RLHs  in  R  requires  only  seconds  for  a 
14  x  10  design  matrix.  Applying  Florian’s  method  iteratively  takes  a  similar  amount  of 
time.  In  practice,  generation  of  1,000  small  RLHs,  selection,  and  administration  of 
Florian’s  method  are  immediately  sequential  and  involves  less  than  two  minutes.  The 
GAMS  program  requires  the  most  time,  but  current  applications  for  small  designs  show 
that  it  is  not  uncommon  for  VMIN  to  reach  completion  within  five  minutes. 

We  use  the  largest  of  the  “small”  catalogued  OLH  and  NOLH  designs — n  =  33 
and  k  =  1 1 — to  illustrate  the  minimal  resource  investment  for  our  methodology. 
Generating  1,000  RLHs,  we  create  an  FRLH  with  p  equal  to  0.0307  and  a  condition 

number  of  1.156,  which  meets  our  NOLH  criteria.  We  complete  this  process  in  less  than 
30  seconds  on  a  standard  desktop  processor.  Applying  VMIN  is  not  necessary  to  meet 
NOLH  criteria  for  a  33  x  11  design  matrix. 

As  a  test  case,  we  develop  a  saturated  design  that  does  not  adhere  to  OLH  or 
NOLH  dimensional  constraints:  n  =  33  and  k  =32  (see  Appendix  A).  Approaches  from 
Ang  (2006),  Cioppa  (2002),  and  Ye  (1998)  do  not  develop  designs  to  explore  32 
variables  in  so  few  runs.  Ang  (2006)  requires  65  runs  to  address  32  variables.  Cioppa 
(2002)  requires  a  sample  size  of  513,  while  Ye  (1998)  requires  an  increase  in  the  number 
of  runs  to  131,073,  for  m  =  17.  Steinberg  and  Lin  (2006)  recognize  that  their 
methodology  faces  limitations  in  design  dimensions.  A  comparison  of  the  methods  to 
construct  a  design  to  explore  32  factors  is  in  Table  20.  The  comparison  shows  the 
advantages  of  our  new  technique  over  these  other  construction  methods. 
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Table  20.  A  comparison  of  different  approaches  to  develop  a  design  that  can  explore 
32  variables  shows  that  our  new  S-NOLH  design  requires  the  least  number  of 
runs,  and  is  therefore  most  efficient.  Other  approaches  can  theoretically  develop 
experimental  designs  to  explore  32  or  more  variables,  but  to  the  author’s 
knowledge,  none  are  currently  in  the  catalogue  of  OLH  and  NOLH  designs. 


k  =  32 

Ye 
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P map 
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0 
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In  the  process  of  developing  an  N 3323  design,  we  also  generate  three  new, 
individually  unique  designs  for  the  same  dimensions,  each  meeting  orthogonality  criteria. 
The  best  design  has  pmup  =  0.0435  .  Two  other  designs  have  p  values  of  0.0465  and 
0.0451.  There  are  nine  other  new  and  different  designs  that  do  not  meet  NOLH  criteria; 
they  possess  values  of  p  between  0.05  and  0.06.  Although  other  techniques  can 

theoretically  develop  a  design  to  explore  32  or  more  variables,  such  efforts  (e.g.,  Ang, 
2006  and  Steinberg  &  Lin,  2006)  result  in  exactly  one 
such  design. 

Our  recent  discussion  traces  known  design  dimensions,  but  the  flexibility  of  our 
technique  results  in  previously  unknown  OLH  and  NOLH  designs.  We  begin  with 
Cioppa’s  O^design  as  a  start  point.  Adding  a  new  column  consisting  of  an  ordered 

vector  of  1,  2,...,  17,  we  apply  VMIN.  The  result  is  a  new  orthogonal  design,  0\7 .  A 
repetition  of  this  process  produces  another  completely  orthogonal  design,  0\ 1 .  The 
author  knows  of  no  other  method  that  produces  an  OLH  of  this  dimension.  Our  extension 
of  this  process  does  not  produce  completely  orthogonal  designs,  but  meets  nearly 
orthogonality  criteria  for  an  A'37  design.  Beyond  13  factors,  we  apply  our  full 

methodology  to  produce  an  N\7a  design  with  p  =  0.0466 .  We  also  initiate  a  new 
FRLH,  discarding  Cioppa’s  O)1 ,  and  create  a  fully  saturated  NOLH,  N\],  with 
Pmap  =  0.0491  as  shown  in  Table  21.  Although  we  do  not  use  Cioppa’s  original  O)1  as  a 
basis,  the  N\]  design  still  retains  good  properties,  such  as  space  fdling  (see  Appendix  B). 


64 


Table  21.  This  is  a  new  design  from  an  original  FRLH  for  17  runs.  We  extend  the 
number  of  factors  to  explore  until  it  is  fully  saturated.  It  allows  an  experimenter 
to  explore  16  factors  within  17  runs.  The  k:n  ratio  of  0.94  is  large  and  makes  the 

design  a  good  screening  plan. 
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We  complete  our  examples  with  a  design  that  does  not  follow  current  size 
conventions.  We  seek  a  design  that  explores  20  factors  within  25  runs.  The  k:n  ratio  is 
0.80.  There  are  no  similar  designs  to  use  as  starting  templates.  Neither  n  nor  k  is  a  factor 
of  two.  After  initiating  a  new  FRLH,  we  produce  a  new  NOLH  design,  V205  , 
p  =0.0439  within  10  minutes  using  VMIN.  After  iteratively  adding  columns  we 
produce  a  final  S-NOLH  ( ),  shown  in  Appendix  A.  An  alternative  for  extending  an 
V24  to  become  an  iV245  is  to  add  a  new  row  (for  the  new  level  24th)  to  A234  and  apply 
Florian’s  method  and  then  VMIN  to  produce  an  NOLH.  Taking  the  new  NOLH  (now 
N;l)  we  add  a  new  column  and  again  apply  Florian’s  method  and  VMIN,  respectively. 
The  result  is  a  new  saturated  NOLH  for  /V24 .  The  flexibility  of  this  alternate  process  is 
an  important  attribute  of  this  method. 

Table  22  compares  a  few  design  dimensions  from  different  construction 
methodologies.  It  is  evident  that  our  new  method  of  combining  FRLH  and  mixed  integer 
programming  (FRLH-MIP)  has  the  flexibility  to  create  unique  designs  that  cater  to  the 
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experimenter’s  needs.  As  previously  mentioned,  Steinberg  and  Lin  (2006)  require  64 
runs  to  explore  32  factors,  while  FRLH-MIP  uses  33  runs  (Appendix  A).  This  short 
discussion  encompasses  three  unique  design  combinations  for  each  of  the  given 
techniques.  The  new  designs  are  saturated  and  further  fill  the  OLH  and  NOLH  library. 

Table  22.  A  summary  of  techniques  from  Cioppa,  Ang,  Steinberg  and  Lin,  and  our 
new  method  shows  that  the  FRLH-MIP  method  is  a  viable  option  to  extend  the 
library  of  OLH  and  NOLH  designs.  A  label  of  “NA”  designates  dimensions  not 
currently  available  from  the  technique. 


Cioppa 

Ang 

Steinberg 
and  Lin 

New 

n 

17 

17 

16 

17 

k 

7 

8 

12 

16 

P  map 

0 

0 

0 

0.0490 

n 

33 

33 

NA 

33 

k 

11 

16 

NA 

32 

P  map 

0.0234 

0 

NA 

0.0451 

n 

65 

65 

64 

64 

k 

16 

32 

56 

63 

P  map 

0.0219 

0 

0 

0.0443 

E.  SUMMARY 

Combining  FRLHs  with  an  optimization  routine  to  construct  new  designs 
provides  the  freedom  to  create  the  variety  of  orthogonal  and  nearly  orthogonal  design 
dimensions  that  we  have  discussed.  The  capability  to  produce  many  different  NOLH 
designs  with  the  same  dimensions  introduces  great  possibilities  in  stacking  and  crossed 
designs.  With  a  choice  of  different  designs,  possessing  acceptable  nonorthogonality 
traits,  scientists  can  use  other  measures  to  discriminate  among  them.  The  ,  and 

/V32  designs  in  Appendix  A  are  examples  of  this  flexibility.  Examining  each  design 
shows  that  they  are  not  dependent  on  each  other  (i.e.,  the  .'V,”  design  is  not  just  a 
one-column  extension  of  ).  Eliminating  one  column  from  ,  or  two  columns  from 
A32 ,  creates  three  different  A,”  designs.  In  comparison  with  the  effort  that  other 
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methods  require  to  produce  one  design,  the  undertaking  for  our  technique  is  relatively 
trivial.  Coupled  with  its  flexibility  in  design  dimensions,  our  approach  provides  scientists 
with  a  new  tool  to  fill  much  of  the  OLH  and  NOLH  library. 
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V.  MANAGING  MIXED-FACTOR,  MIXED-LEVEL 
EXPERIMENTS  WITH  A  NEW  DESIGN  METHODOLOGY 

In  accordance  with  the  LHS  construct  (McKay  et  ah,  1979),  OLH  and  NOLH 
designs  assume  that  all  variables  are  continuous.  Frequently  this  assumption  is  false. 
Many  experiments  include  a  number  of  discrete  variables  that  do  not  necessarily  have  the 
same  number  of  value  levels.  Converting  (by  rounding)  the  actual  values  of  these 
discrete  variables  onto  a  raw  OLH  or  NOLH  design,  especially  when  there  are  a  small 
number  of  design  points,  often  results  in  an  overall  design  with  poor  orthogonality 
properties.  We  designate  these  types  of  designs  as  mixed-factor,  mixed-level 
(MFML)  experiments. 

MFML  experiments  diminish  the  advantages  of  OLH  and  NOLH  designs.  In  this 
chapter,  we  explore  means  to  mitigate  these  disadvantages  and  restore  near  orthogonal 
properties.  The  foundation  for  creating  a  nearly  orthogonal  MFML  design  is  the  set  of 
continuous  variables  from  which  we  produce  an  NOLH  using  our  new  techniques.  Our 
study  exploits  the  dimensional  flexibility  of  our  new  designs,  by  combining  them  with 
stacking  methods  and  intelligent  application  of  proven  design  techniques.  The  resulting 
MFML  designs  retain  much  of  the  orthogonality  properties  of  the  basic  NOLH,  thereby 
maintaining  their  utility.  In  practice,  this  approach  has  worked  well  for  creating  an 
efficient  design  for  a  problematic  set  of  factors. 

A.  THE  BASE  CONTINUOUS  DESIGN 

The  set  of  continuous  variables  in  the  experiment  determines  the  base  design  and 
is  the  foundation  of  the  MFML  methodology.  Integer  variables  with  a  large  number  of 
value  levels  may  also  be  part  of  the  base  design.  The  complete  set  of  variables  in  the 
base  design  provides  the  flexibility  in  our  methodology.  Because  the  number  of  runs  for 
this  subset  of  variables  can  usually  be  set  to  any  number  n,  the  base  design  can  match  the 
design  for  the  set  of  discrete  variables  as  needed. 

Previous  chapters  describe  the  ease  of  generating  an  OLH  or  NOLH  from  a  set  of 
continuous  variables  with  a  confluence  of  one  or  more  techniques.  The  lone  “hard” 

constraint  with  this  methodology  is  that  n>  k .  However,  some  analysts  may  choose  to 
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avoid  saturated  designs.  Space-filling  characteristics  may  be  an  important  feature  for  the 

k 

experiment.  The  analyst  may  also  decide  that  a  design  where  —  <  0.33  is  more 

n 

appropriate,  or  desire  a  greater  number  of  degrees  of  freedom.  No  matter  the  method  for 
selecting  n,  it  presents  an  initial  restriction.  We  designate  this  initial  restriction  in  the 
number  of  runs  as  n, ,  where  nf  >  k  ,  and  use  it  to  guide  constructing  a  separate  design  for 
the  set  of  discrete  variables. 

B.  DISCRETE  INTEGER  VARIABLES  AND  STACKING  METHODOLOGY 

The  set  of  discrete  variables  presents  difficulties  for  creating  a  LH  design  that 
possesses  an  acceptable  pmap .  Often,  discrete  integer  variables  do  not  have  the  same 

number  of  value  levels.  In  such  cases  where  sample  size  is  a  constraint,  there  may  be 
little  chance  for  creating  a  complete  balanced  design  for  them  (Box  et  ah,  1978: 
Satterthwaite,  1959).  We  first  consider  cases  when  there  is  a  single  discrete  integer 
variable  mixed  with  an  experiment  that  predominantly  contains  continuous  variables. 
Our  method  also  handles  subsets  of  discrete  variables,  each  with  differing  numbers  of 
value  levels.  A  part  of  this  process  uses  stacking  methods — i.e.,  permuting  the  columns 
and  appending  additional  rows  or  another  design  (Cioppa  &  Lucas,  2007).  The  full 
methodology  includes  stacking  two  or  more  designs,  adding  columns  for  binary  factors 
through  random  permutations  of  {0,  1},  and  crossing  quantitative  designs  with  designs 
for  qualitative  variables.  The  goal  for  systematically  constructing  an  MFML  experiment 
is  to  create  a  nearly  orthogonal  random  balanced  design  (Satterthwaite,  1959). 

1.  One  Subset  of  Discrete  Variables  with  a  Common  Value  Level 

We  examine  the  set  of  discrete  variables  and  identify  the  value  level,  which  we 
denote  as  £. ,  for  each  variable,  i  =  1,2 If  it  is  reasonable  to  use  a  common  value 

level  £  (i.e.,  all  discrete  variables  have  the  same  number  of  value  levels),  it  is  much 
easier  to  generate  a  total  MFML  design  with  nearly  orthogonal  properties.  Given  that  £ 
is  greater  than  the  number  of  discrete  variables,  d,  we  apply  similar  techniques  for 
creating  an  NOLH  from  continuous  variables.  From  empirical  study,  cases  in  which 
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£  >  10  >  d  show  that  our  methods  often  result  in  a  design  with  an  acceptable  , o  .  For 

cases  in  which  £  <  d  ,  we  split  the  set  of  discrete  variables  into  smaller  subsets  such  that 
£>  d  is  true  for  each  new  subset,  permitting  the  application  of  previous  techniques  to 
create  an  NOLH.  In  the  next  section,  a  small  example  demonstrates  the  significance  of 
establishing  these  constraints  on  £ . 

2.  Addressing  More  Than  One  Subset  of  Discrete  Variables 

For  experiments  containing  several  discrete  variables  with  varying  numbers  of 
value  levels,  we  group  the  discrete  variables  into  subsets  with  corresponding  common 
£'s.  Subject  matter  expertise  and  the  analyst’s  reasoning  can  simplify  the  number  of 
subsets  with  which  to  work.  The  concept  for  developing  a  complete  design  with  these 
disparate  subsets  of  discrete  variables  is  to  incorporate  each  subset  design  as  an  LH  into 
the  overall  design.  Figure  10  illustrates  this  idea  later  in  the  section. 

We  identify  w  subsets  of  discrete  variables  and  designate  the  discrete  subset 
designs  as  DVvDV2,...,DVw.  Each  subset  has  a  corresponding  number  of  value  levels, 

£l,£1,...£w,  from  which  we  determine  the  least  common  multiple  (LCM)  and  designate  it 
as  nDV,  which  is  the  minimum  total  number  of  runs  for  all  discrete  variables  in  the 
design.  For  each  £. ,  there  exists  an  integer  value  ,  such  that  £ .  •bi  =  nDV,  i  =  \,2,...,w, 
and  detennines  the  initial  number  of  stacks  for  each  DVj .  We  adjust  the  number  of 
stacks  to  match  n,  if  n,  <  nDV ,  or  increase  n,  if  nf  >  nDV  . 

The  intent  of  creating  a  balanced  design  LH  (Box  et  ah,  1978;  Satterthwaite, 
1959)  drives  our  choice  for  grouping  discrete  variables.  We  contemplate  (.  to  determine 
whether  it  is  possible  to  incorporate  a  smaller  DV  within  a  larger  DV.  For  instance,  we 
consider  three  sets  of  discrete  variables  with  value  levels  5,  8,  and  10,  thus 
nDV  =  LCM  =  40.  We  denote  each  design  as  DVrDV2,  D  V, ,  respectively.  A  savvy 
analyst  could  reasonably,  if  resources  permit,  decide  to  increase  the  number  of  value 
levels  for  DV2  from  8  to  10,  thereby  creating  a  larger  group  (  D V, )  based  on  10  value 
levels.  This  simplification  makes  a  balanced  design  easier  to  obtain  for  a  reasonable  n. 
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The  experimenter  matches  two  stacks  of  DV \  within  every  one  D V3 .  We  update  LCM: 
nDV=LCM=  10. 

To  further  clarify  the  MFML  process,  we  establish  that  the  base  design  of 
continuous  variables  requires  n  =  40  design  points,  based  on  the  original  LCM.  We 
determine  the  final  number  of  stacks  for  DVl  and  D  V3  based  on  n  to  be  8  and  4, 

respectively.  The  four  stacks  of  D  V3  act  as  the  base  stack  for  the  eight  stacks  of  DVX .  A 
short  program  in  R  performs  iterative  random  stacking  of  DVX .  For  each  iteration  of 
stacking,  we  append  the  new  stack  to  the  stacks  of  DV3  and  compute  p  .  The 

experimenter  can  limit  the  number  of  iterations  to  a  number  of  trials  deemed  reasonable 
or  a  threshold  for  an  acceptable  p  . 

A  final  generation  of  a  base  design  from  the  remaining  continuous  variables 
results  in  an  MFML  design  with  the  structure  shown  in  Figure  10.  We  note  that  the 
flexibility  of  our  new  NOLH  designs  allows  the  base  design  to  adjust  to  the  discrete 
variable  designs.  For  clarification,  we  rename  Df j  as  DV5  and  DV3  as  DV10  in 
the  illustration. 


dv5 

DVio 

dv5 

dv5 

DV10 

dv5 

Base 

dv5 

DVio 

dv5 

dv5 

DV10 

dv5 

Figure  10.  The  structure  of  the  stacked  MFML  design. 

3.  Formalized  Stacking  Methodologies 

In  this  section,  we  discuss  our  stacking  approach  as  part  of  the  MFML 
methodology.  Whereas  stacking  is  sometimes  applied  to  increase  n,  we  use  stacking  as 
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an  opportunity  to  improve  p  for  the  subsets  and  overall  set  of  discrete  integer 

variables.  A  stacked  design  in  which  each  design  is  based  on  randomly  permuted 
columns  of  the  original  design  can  improve  the  p  of  the  complete  design,  such  that 

(pman)  <(/}„)  (Cioppa  &  Lucas,  2007).  This  method  of  stacking  helps 

efforts  to  improve  the  overall  design  that  includes  the  base  design. 

Stacking  also  provides  the  freedom  to  mitigate  assumptions  that  prove  false.  Our 
discussions  in  the  previous  two  sections  assume  ii>di,  /  =  1,2,...,  vv,  where  dt  is  the 
number  of  variables  in  the  ith  subset  of  discrete  variables.  Another  assumption  is  that 
nDV  >  kDV  ,  where  kDV  is  the  total  number  of  discrete  variables,  kDV  =dl+d2+...  +  dw , 

and  that  nDV  >  k  .  When  these  assumptions  do  not  hold,  we  use  stacking  methods. 

To  apply  stacking  methods,  we  first  establish  the  final  number  of  runs  for  the 
experiment,  n,  by  comparing  nDV  and  nf :  (1)  If  nDV  >nn  then  n  =  nDV ;  (2)  If  nDV  <  n, , 

then  we  take  the  ceiling9  of  the  ratio  of  the  n,  and  nDV  as  a  modification  factor,  such  that 


n  = 


*DV 


xnr 


The  number  of  stacks  for  each  DVj  in  the  overall  MFML  design  is 


s,  =  A  x 


,  recalling  that  bj  is  the  initial  number  of  stacks  for  each  subset  of  discrete 


variables.  We  adjust  bt  based  on  the  modification  factor, 


ni 

nDV 


,  if  this  is  possible. 


We  leverage  the  new  method  for  stacking  from  Cioppa  &  Lucas  (2007)  to  reduce 
the  nonorthogonality  of  the  overall  stacked  design.  We  generate  a  DV  design  with  a 
reasonable  p  and  permute  the  columns  to  create  a  “new”  design — once  for  each 

required  number  of  stacks,  s,  for  the  design.  Each  rearranged  LH  is  not  truly  a  new 
design  since  there  is  a  one-for-one  column  match  in  the  new  and  original  design,  albeit  in 
different  column  order.  However,  the  overall  stacked  design  is  different  and  an 
improvement  in  p  is  evident  (Cioppa  &  Lucas,  2007).  We  may  permute  the  k 


9 


The  ceiling  of  the  value  x  is  the  smallest  integer  value  that  is  greater  than  or  equal  to  x. 
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columns  in  any  of  k !  ways  and  choose  s  of  them  to  create  an  overall  stacked  design.  Put 


another  way,  we  can  choose  from  any  of 


stacked  designs.  We  select  the  set  of  new 


designs  based  on  a  desire  to  reduce  the  pmap  of  the  stacked  design.  Another  simple 

program  in  R  allows  a  search  for  a  set  of  “new”  designs  that  will  stack  into  an  overall 
discrete  integer  design  with  an  acceptable  p  .  After  selecting  the  set  of  “new”  designs 

to  stack,  we  see  that  there  are  s !  ways  of  stacking  them.  Each  restacked  design  from  the 
same  set  of  “new”  designs  results  in  the  same  overall  pmap .  These  findings  provide  a 

flexibility  to  reduce  nonorthogonality  inherent  in  a  singe  subset  of  discrete  variables  in 
MFML  experiments. 

Laterally  appending  a  stacked  design  for  one  subset  of  discrete  variables  to 
another  subset’s  discrete  variables  design  may  increase  pmap  for  the  joined  design, 

DV joined  •  To  minimize  pmap  for  the  joined  design,  we  hold  one  set  of  stacked  designs  as 
the  base  stack,  DV Base,  while  randomly  restacking  the  subset  of  interest,  D  Vlnterest . 
During  random  restacking,  DVInterest  maintains  its  p  ,  but  its  correlation  with  DVBase 
changes,  thereby  changing  pmap  for  DVJoined.  We  continue  restacking  DVInterest  until  pmap 
for  D  VJomed  is  acceptable  or  ceases  to  improve.  Since  pmap  for  the  joined  design  will  not 
be  better  than  the  worse  p  of  the  two  DV s,  it  is  reasonable  to  set  p  from  the  worse 
of  DVBase  and  DVInterest  as  a  threshold.  The  joined  design  is  the  new  base  stack.  We 
incorporate  a  new  DVInlerest  in  the  same  manner  until  all  discrete  variable  subsets  are  in 
the  base  stack. 

In  the  case  of  a  single  integer  variable,  a  simplistic  approach  reaps  significant 
benefits.  We  randomly  permute  a  vector  of  the  possible  values  of  the  variable;  the  vector 
consists  of  each  value  repeated  exactly  the  same  number  of  times  that  there  would  have 
been  stacks.  For  instance,  a  single  variable  with  £  =  4  levels  and  n  =  20,  results  in  a 

ti  20 

vector  consisting  of  the  values  1,  2,  3,  and  4,  each  repeated  —  =  —  =  5  times, 
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x  =  (1,1, 1,1,1,  ...,4,4,4, 4,4) .  The  analyst  generates  a  new  random  pennutation  of  the 
vector,  appends  it  to  DV Base ,  measures  p  ,  and  compares  it  with  a  specified  correlation 

criterion.  A  maximum  number  of  iterations  may  also  be  a  criterion.  A  modification  of 
the  R  program  that  handles  restacking  helps  in  this  process.  We  find  the  single  integer 
variable  approach  useful  when  there  are  few  value  levels  (£<10)  and  applicable  when 
discrete  variables  cannot  be  grouped. 

Our  overall  design  strategy  includes  known  design  techniques.  Centered  designs 
and  two-level  factorial  designs  are  often  part  of  MFML  experiments.  Additionally,  a 
library  of  orthogonal  arrays  (Sloane,  2006)  offers  a  number  of  catalogued  orthogonal 
arrays  (OAs)  that  may  match  the  requirements  for  an  MFML  experiment.  For  readers 
interested  in  more  details  about  OAs,  Hedayat  et  al.  (1999)  have  a  comprehensive 
discussion  of  theory  and  applications.  A  group  of  these  OAs  already  contains  designs 
with  factors  that  have  varying  numbers  of  value  levels.  For  instance,  the  library  lists  an 
MA.  12.2.4.3.1  design.  The  nomenclature  represents  a  mixed  orthogonal  array  (MA)  that 
has  12  runs,  4  two-level  factors,  and  1  three-level  factor.  These  designs  are  very  specific 
to  the  types  and  mixture  of  factors  they  can  address. 

Many  cases  may  occur  for  subsets  of  discrete  variables.  We  do  not  attempt  to 
enumerate  them  and  leave  it  to  the  analyst  to  best  simplify  the  experiment,  while  using 
our  guidance  for  combining  design  methods.  The  goal  for  this  methodology  is  to  develop 
a  design  that  can  explore  the  main  effects  of  each  factor. 

Once  the  final  base  stack  design  is  complete,  the  experimenter  can  apply  our  new 
techniques  or  other  known  methods  to  create  the  base  design  from  the  set  of  continuous 
variables.  The  total  number  of  runs  is  still  n.  The  number  of  variables  for  the  base 
design  is  kcv  =  k-kDV  .  We  expect  to  benefit  from  the  flexibility  of  our  new  designs  if 

known  OLH  and  NOLH  designs  cannot  accommodate  kcv . 

C.  COMBINED  AND  STACKING  METHODOLOGIES 

This  section  describes  our  procedure  for  logically  combining  stacking 
methodologies  and  proven  design  schemes  to  produce  designs  with  acceptable 
nonorthogonality  measures.  Because  there  are  many  possible  types  of  variables  and 
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different  numbers  of  value  levels  that  can  occur,  we  discuss  our  general  methodology  for 
a  fairly  complex  case.  A  complex  case  involves  a  design  that  has  continuous  variables, 
discrete  variables — each  set  with  a  different  number  of  value  levels  and/or  single  DV  that 
has  a  number  of  value  levels  that  matches  neither  of  the  DVs’  sets.  We  include  a  flow 
chart  as  a  template  for  the  MFML  methodology.  Description  of  a  student  thesis  that 
employed  this  process  illustrates  its  utility. 


1.  Steps  for  Creating  a  Mixed  Factor,  Mixed  Level  Design 

This  section  lists  the  general  steps  to  guide  the  creation  of  an  MFML  design  with 
acceptable  nonorthogonality.  In  their  present  fonn,  these  steps  are  helpful,  but  require 
more  specificity.  Experience  and  reason  are  required  to  detennine  how  to  best  employ 
the  methodology  for  each  new  case. 


Step  1:  Determine  the  sets  of  discrete  and  continuous  variables.  Set  n,  to  meet 
requirements  for  all  variables,  such  that  n,>k  + 1. 

Step  2:  Group  the  set  of  DVs  into  subsets  that  have  common  numbers  of  value  levels 
(£j).  Simplify  subsets  and  t.  as  logic  dictates. 

Step  3:  Split  subsets  of  DVs  if  ti<di  in  the  subset. 

Step  4:  Detennine  LCM  =  nDV  from  the  set  of  lt,  /  =  !,...,  w .  Determine  bn  Vi. 


Step  5:  Adjust  the  total  number  of  runs  to  be  n  = 


x  n 


DV  • 


Step  6:  Use  new  and  past  techniques  to  create  the  base  design  with  an  acceptable  p  . 

Tl 

Step  7:  Determine  the  number  of  stacks  required  for  each  set  of  DVs:  s.  = — ,  Vi  or 


=  h  x 


DV 


,  Vi. 


Step  8:  Create  a  stacked  design  for  each  set  of  DVs,  restack,  and  append  until  the  overall 
design  meets  the  threshold  for  p  . 


It  is  common  for  p  of  the  stacked  DVs  to  dictate  p  for  the  overall  design. 
At  times,  the  experimenter  may  wish  to  stack  the  base  design. 
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2.  Design  Best  Practices  within  the  Mixed  Factor,  Mixed  Level  Process 

Intertwining  proven  design  schemes  with  the  MFML  methodology  is  integral  to 
our  overall  process.  Full  and  fractional  factorial  designs,  centered  designs,  and  simple 
LHs  are  a  few  of  the  proven  techniques  that  also  play  a  role  in  creating  a  full  MFML 
design.  Although  we  have  concentrated  on  numeric  variables,  many  experiments  include 
qualitative  variables.  Binary  and  categorical  variables  are  often  essential  to  the  scientific 
method. 

Recall  that  a  major  objective  in  the  MFML  process  is  for  a  balanced  design. 
Understanding  that  nonnumeric  variables  play  no  part  in  computing  correlation,  it  is  still 
important  for  our  technique  to  incorporate  these  variables  in  the  MFML.  We  can  cross 
orthogonal  designs  derived  from  our  techniques  with  other  designs.  The  flexibility  that 
we  offer  in  design  dimensions  eases  the  ability  to  match  the  dimensions  of  a  balanced 
nonnumeric  design.  It  is  not  uncommon  to  have  five  binary  variables  in  an  experiment. 
A  full  factorial  design  requires  2 5  =  32  runs.  In  the  absence  of  a  catalogued  orthogonal 
(or  nearly  orthogonal)  design,  we  can  offer  a  32-run  design  from  our  technique  with  the 
set  of  numeric  variables.  We  can  also  incorporate  larger  resolution  III  or  resolution  V 
fractional  factorial  designs,  such  as  those  in  Sanchez  &  Sanchez  (2005). 

D.  EXPLORING  FIRST  RESPONSE  TO  A  BOMB  ATTACK  USING  A 

MIXED-FACTOR,  MIXED-LEVEL  DESIGN 

This  section  demonstrates  the  utility  of  the  MFML.  We  discuss  the  general 
methodology  in  a  fairly  complex  case  involving  a  graduate  student’s  thesis.  The  study  is 
ideal  for  illustration  because  it  explores  a  mixture  of  qualitative  and  quantitative 
variables,  which  include  integer  and  continuous  variables. 

1.  The  Thesis  Problem 

Major  Jon  Roginski’s  thesis  (2006),  “Simulating  Emergency  Response  Using 
Multi-Agent  Simulation,”  studies  first  response  to  a  bomb  attack  in  Baltimore’s  Inner 
Harbor  area  (see  Figure  11).  It  leverages  simulation  techniques  to  detennine  the  efficient 
use  of  resources  for  the  Department  of  Homeland  Security  (DHS)  within  the  National 
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Exercise  Program.  DHS’s  previous  efforts  to  employ  simulations  have  cost  tens  of 
millions  of  dollars,  with  an  unsatisfactory  return  on  investment  (Roginski,  2006). 


Figure  11.  Baltimore  harbor  vignette  terrain:  actual  photo  and  within 

multiagent  simulation  (Roginski,  2006). 


Major  Roginski  adapts  an  existing  organizational  learning  process  and  integrates 
low-  and  high-resolution  simulation  to  provide  decision  support.  His  study  presents  the 
potential  benefits  of  low-resolution  simulation,  using  efficient  experimental  design  and 
high-performance  computing.  He  examines  a  48-dimensional  response  surface.  These 
factors  are  a  mixture  of  integer  and  DVs,  as  well  as  three  two-level  variables  and  two 
three-level  variables. 

This  is  quite  a  design  challenge.  Without  the  use  of  an  efficient  design, 
Major  Roginski  calculates  that  a  traditional  gridded  design  for  48  factors  and  30 

24 

replications  requires  1.60  x  10  CPU  hours,  or  approximately  116  trillion  times  the  age 
of  the  universe,  even  with  the  help  of  the  Maui  High  Performance  Computer  Center 
(MHPCC).  Using  an  efficient  MFML  design  and  the  MHPCC,  all  experiments  are 
completed  in  less  than  three  weeks  or  156  CPU  centuries  (Roginski,  2006). 
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2.  Applying  the  Mixed-Factor,  Mixed-Level  Methodology  to  a 
Thesis  Question 

Distilling  this  problem  requires  an  examination  of  the  variables  in  the  experiment. 
The  analyst  considers  48  variables,  three  of  which  are  color  or  pattern  indicators  in  the 
simulation.  These  three  variables  are  not  part  of  the  design.  Five  other  variables  are 
discrete:  three  have  two-levels  and  two  have  three-levels.  Sloane  (2006)  does  offer  some 
alternatives  for  this  design,  but  the  designs  in  the  orthogonal  array  library  are  fairly 
restrictive  in  the  number  of  runs  and  number  of  mixed  factors  they  can  handle. 

Table  23  shows  40  of  the  variables  that  are  the  focus  of  design  development.  The 
table  shows  that  many  of  the  variables  have  only  eight  levels.  Our  previous  discussions 
in  this  chapter  state  that  variables  with  less  than  ten  levels  are  problematic.  A  number  of 
the  variables  have  144  or  more  levels,  which  may  be  considered  as  continuous.  Lastly, 
four  variables  have  1 1  levels.  Table  23  separates  the  different  number  of  value  levels. 


79 


Table  23.  Forty  of  the  variables  in  Major  Roginski’s  experiment  are  displayed.  The 
variables  are  all  integers,  but  some  have  enough  value  levels  to  be  considered 
as  continuous,  as  the  shaded  rows  indicate. 


Variable  Name 

Min  Value 

Max  Value 

#  Value  Levels 

Type 

Num  Agl 

0 

7 

8 

Integer 

Num  Ag2 

0 

7 

8 

Integer 

Num  Ag3 

0 

7 

8 

Integer 

Num  Ag4 

0 

7 

8 

Integer 

Num  Gun 

0 

7 

8 

Integer 

Num  EMT  SJ 

0 

7 

8 

Integer 

Num  EMT  MM 

0 

7 

8 

Integer 

Num  ESU1 

0 

7 

8 

Integer 

Num  ESU2 

0 

7 

8 

Integer 

Num  ESU3 

0 

7 

8 

Integer 

Num  ESU4 

0 

7 

8 

Integer 

Num  ESU5 

0 

7 

8 

Integer 

Num  Folll 

0 

7 

8 

Integer 

Num  Foll2 

0 

7 

8 

Integer 

Num  Foll3 

0 

7 

8 

Integer 

Num  Traffl 

0 

7 

8 

Integer 

Num  Traff2 

0 

7 

8 

Integer 

Num  Traff3 

0 

7 

8 

Integer 

Num  Traff4 

0 

7 

8 

Integer 

Num  Traff5 

0 

7 

8 

Integer 

Num  Traff6 

0 

7 

8 

Integer 

Num  Traff7 

0 

7 

8 

Integer 

Num  Traff8 

0 

7 

8 

Integer 

Num  Traffl) 

0 

7 

8 

Integer 

Num  TrafflO 

0 

7 

8 

Integer 

Num  Traffl  1 

0 

7 

8 

Integer 

Num  Traffl 2 

0 

7 

8 

Integer 

Desire  Civ 

1 

144 

144 

Integer-Continuous 

Color 

1 

144 

144 

Integer-Continuous 

Marksman  Pol 

1 

144 

144 

Integer-Continuous 

Prob  Com 

1 

144 

144 

Integer-Continuous 

Vul  Ag 

1 

144 

144 

Integer-Continuous 

Vul  Gun 

1 

144 

144 

Integer-Continuous 

Marksman  Ag 

1 

144 

144 

Integer-Continuous 

Marksman  Gun 

1 

144 

144 

Integer-Continuous 

Num  Civ 

0 

150 

151 

Integer-Continuous 

Num  SWAT 

5 

15 

11 

Integer 

Attr  in  Orders 

0 

10 

11 

Integer 

Attr  in  Ag 

0 

10 

11 

Integer 

Eff  Bomb 

0 

10 

11 

Integer 
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There  are  a  few  approaches  that  the  experiment  may  use  as  a  modification  of  the 
steps  listed  previously.  We  acknowledge  that,  at  the  time  of  creating  this  design,  VMIN 
had  not  yet  been  fully  developed.  Because  27  of  the  variables  have  only  eight  value 
levels,  use  of  the  single  integer  method  is  a  useful  scheme.  Nine  of  the  variables  have 
144  or  more  value  levels.  The  experimenter  may  consider  these  variables  to  have  a 
common  £  of  144.  We  easily  develop  an  NOLH  with  previous  techniques. 

The  remaining  four  variables  have  1 1  value  levels.  Because  £  >  d  for  this  subset, 
creation  of  a  4  x  11  NOLH  design  is  possible,  followed  with  a  stacking  approach.  The 
overall  design  will  require  at  least  144  design  points.  Therefore,  it  is  possible  to  develop 
an  FRLH  design  for  all  these  numeric  variables.  A  large  number  of  design  points 
mitigates  round  off  issues  and  converts  the  actual  values  of  DVs  into  a  nearly  orthogonal 
design  matrix. 

The  decision  to  treat  40  of  the  integer  variables  as  continuous  variables,  and 
develop  a  base  design  from  them,  involves  the  LCM  in  the  two-level  and  three-level 
variables  in  the  experiment.  Three  two-level  variables  require  an  eight-run  design,  which 
can  be  orthogonal.  A  full-factorial  design  for  two  three-level  variables  requires  nine 
runs.  The  LCM  for  these  two  sets  of  DVs  is  72.  Because  a  number  of  the  integer 
variables  have  144  levels,  two  stacks  of  the  combined  DV  sets  matches  the  base  design. 

To  illustrate,  Table  24  has  the  eight  design  points  of  a  full-factorial  design  for  the 
two-level  variables.  They  match  with  one  design  point  of  the  two,  three-level  variables 
(diagonal  shading).  Since  there  are  nine  design  points  for  the  two,  three-level  variables, 
these  eight  design  points  for  the  two-level,  full-factorial  design  is  repeated  for  each 
unique  design  point,  which  results  in  72  runs. 
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Table  24.  Excerpt  of  the  DV  subset  design  for  the  MFML,  containing  24  of  the  144 
total  design  points,  shows  how  a  full,  two-level  factorial  design  for  three  binary 
variables  aligns  with  three  design  points  of  two  discrete  variables,  each  with  three 
value  levels.  We  shade  sections  of  the  table  to  emphasize  the  match  between  a 
single  design  point  for  the  discrete,  non-binary  variables  and  a  full  two-level 
factorial  for  the  binary  variables. 


ESU 

FO  Police 

Traffic 

Time  OP 

Panic 

0 

0 

0 

1 

1 

0 

0 

1 

1 

1 

0 

1 

0 

1 

1 

0 

1 

1 

1 

1 

1 

0 

0 

1 

1 

1 

0 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

1 

2 

0 

0 

1 

1 

2 

0 

1 

0 

1 

2 

0 

1 

1 

1 

2 

1 

0 

0 

1 

2 

1 

0 

1 

1 

2 

1 

1 

0 

1 

2 

1 

1 

1 

1 

2 

0 

0 

o 

1 

3 

0 

0 

1 

1 

3 

0 

1 

0 

1 

3 

0 

1 

1 

1 

3 

1 

0 

0 

1 

3 

1 

0 

1 

1 

3 

1 

1 

0 

1 

3 

1 

1 

1 

1 

3 

Stacking  this  set  of  72  design  points  atop  one  another  results  in  144  total  design 
points,  which  correspond  with  the  number  of  runs  in  the  base  design.  Using  an  iterative 
application  of  Florian’s  method  with  the  remaining  40  variables  (assuming  all  are 
continuous)  and  n  =  144  produces  a  base  design  with  p  =0.00766.  When  the  actual 

values  of  the  integer  variables  treated  as  continuous  are  converted  (by  rounding)  into  the 
design,  the  resulting  nonorthogonality  measure  is  p  =  0.0593 .  Despite  the  many 
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variables  with  small  numbers  of  value  levels,  the  nonorthogonality  measure  remains 
relatively  small.  The  complete  n  =  144,  k  =  45  design  is  in  Appendix  C. 

Examination  of  this  design  reveals  advantages  of  the  MFML  approach.  An 
NOLH  design  using  Cioppa’s  (2002)  method  to  explore  45  factors  requires  m  =  9,  which 
corresponds  with  n  =  1,025  and  k  =  46.  This  is  if  all  variables  are  considered  continuous. 
If  the  five  qualitative  variables  are  separated,  an  NOLH  design  to  examine  the  remaining 
40  variables  still  requires  1,025  runs  since  m  =  8  only  results  in  k  =  37  and  n  =  513.  A 
conventional  crossed  design  approach  to  combine  the  qualitative  variables  with  the 
NOLH  could  lead  to  73,800  runs.  Furthermore,  we  are  unaware  of  the  existence  of  either 
a  40-  or  45-variable  NOLH  design  with  so  few  design  points.  Meanwhile,  our  methods 
have  developed  an  FRLH  design  that  explores  up  to  172  variables  within  193  runs  ( A,1”  ) 

and  possesses  a  nonorthogonality  measure  of  0.0275.  The  MFML  design  reduces  the 
number  of  runs  in  the  NOLH  by  more  than  85%,  while  preserving  enough  of  the 
orthogonality  properties  of  the  base  design  to  gain  advantages  in  the  analysis  of  the 
simulation  output. 

Major  Roginski’s  study  successfully  examines  all  48  factors  using  this  design. 
His  analysis  of  the  data  from  the  experimental  design  yields  major  findings,  undiscovered 
from  previous  studies: 

•  The  most  important  factor  in  achieving  success  in  crisis  mitigation  is  the 
effectiveness  of  the  police. 

•  If  a  police  force  is  not  well  trained,  they  may  achieve  greater  success  by 
being  less  persistent  with  individuals. 

•  Well  established,  well  executed  standard  operating  procedures  (SOPs) 
may  play  a  more  important  role  in  first  response  operations  than 
interagency  communication. 

•  There  is  a  diminishing  return  after  a  certain  level  of  first  responder 
training.  It  may  be  more  effective  to  leverage  resources  elsewhere  after 
reaching  that  level. 

The  ability  to  examine  factors  in  isolation  using  the  MFML  design  contributes  to 
the  validity  and  significance  of  these  findings  (Roginski,  2006). 
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E.  SUMMARY 

MFML  experiments  are  difficult  propositions  for  analysts.  They  are  typically 
problematic,  even  when  using  orthogonal  or  nearly  orthogonal  designs.  Conversion  of 
actual  values  onto  the  raw  OLH  and  NOLH  designs  often  result  in  poor  orthogonality 
properties,  reducing  the  advantages  of  these  efficient  designs. 

Our  methodology  produces  a  balanced  MFML  design  that  exhibits  near 
orthogonal  properties  that  benefit  analysts.  Using  guidelines  to  partition  and  shape  the 
subsets  of  DVs  and  stacking  techniques,  we  apply  proven  design  schemes  to  develop  an 
overall  balanced  design.  The  base  design  consists  of  continuous  variables,  or  integer 
variables  with  many  value  levels.  The  flexibility  of  our  previous  techniques  pennits 
manipulation  of  the  base  design  to  conform  to  the  needs  of  set  of  DVs. 

A  new  methodology  for  stacking  experimental  designs  is  an  important  feature  of 
the  overall  process  for  creating  MFML  designs.  Random  column  permutation  of  an 
original  NOLH  design  does  not  create  a  new  design,  but  randomly  restacking  these 
column-permuted  designs  generally  results  in  a  completely  new  overall  design. 
Experience  and  communication  with  the  customer,  as  well  as  knowledge  about  the 
experiment  of  interest,  directs  the  analyst  to  logically  apply  this  methodology.  A 
flowchart  or  list  of  steps  cannot  fully  capture  this  art. 

A  use-case  demonstrates  the  utility  of  the  MFML  methodology.  Major  Jon 
Roginski’s  study  (2006)  involves  30  replications  of  a  design  consisting  of  45  continuous 
and  integer  DVs  with  varying  numbers  of  value  levels,  as  well  as  simulation-specific 
factors.  Roginski  (2006)  estimates  that  if  a  full  factorial  design  were  used,  the  required 
time  to  complete  these  experimental  runs  approximately  equals  116  trillion  times  the  age 
of  the  universe.  Previous  NOLH  designs  by  Cioppa  (2002)  to  explore  just  45  factors 
require  m  =  9,  which  corresponds  to  n  =  1,025,  assuming  all  continuous  variables.  A 
conventional  crossed  design  approach  to  combine  the  qualitative  variables  with  the 
NOLH  could  lead  to  73,800  runs.  Our  new  methodology  creates  an  MFML  design  with 
144  runs  to  explore  45  variables.  By  using  the  computing  poser  of  the  MHPCC  along 
with  the  efficiency  of  the  nearly  orthogonal  MFML  design,  Major  Roginski’s  experiment 
requires  less  than  three  weeks  or  156  CPU  centuries  (Roginski,  2006). 
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VI.  THE  COMPLETE  FLEXIBLE  OLH/NOLH  METHODOLOGY: 
EXPLORING  SOLDIER-LEVEL  NETWORKS 


We  implement  our  new  designs  to  support  the  United  States  Army  Training  and 
Doctrine  Command  Analysis  Center  in  a  study  of  network-centric  warfare.  Our  design 
helps  assess  the  impact  that  Soldier-level,  network-enabled  capabilities  have  on  a  truck 
tenninal’s  cargo  operations  as  part  of  a  Joint  Force  sustainment  base.  The  research 
studies  current  joint  distribution  operations  and  the  role  that  transportation  operations 
play  in  the  joint  context.  The  system  of  interest  is  the  operational  concept  of  the 
Centralized  Receiving  and  Shipping  Point  (CRSP)  and  its  organization. 

A.  PROBLEM  STATEMENT 

Major  Francisco  Baez,  United  States  Anny,  (2008)  examines  up  to  20  factors  of 
mixed  types  with  mixed  numbers  of  value  levels  to  answer  the  following 
research  questions: 

•  Which  network-enabled  capability  gaps  exist  in  the  execution  of 
transportation  Soldiers  performing  terminal  cargo  operations  tasks,  under 
the  identified  conditions,  to  the  identified  performance  standards? 

•  Which  distribution  structures  and  types  of  network-enabled  capabilities 
allow  transportation  Soldiers  to  accomplish  their  task  to  specified 
standards,  under  given  conditions? 

•  Are  the  network-enabled  capabilities  currently  available  to  individual 
transportation  Soldiers? 

•  What  is  the  operational  impact  of  leaving  the  gap  unfilled? 

This  study  focuses  on  three  dissimilar  communications  network  topologies  that 
include  In-Transit  Visibility  (ITV),  networked  communications,  and  infonnation  systems 
that  provide  network-wide  visibility  of  node  and  mode  status  in  a  shared  Logistics 
Common  Operating  Picture  (LOOP).  Operational  scenarios  from  subject  matter  experts 
will  employ  these  networks  in  a  simulation. 

Factors  come  from  concept  specific  attributes  documents  (JCS,  2005).  The  study 
considers  ITV-availability,  ITV-accuracy,  LCOP-update  rates,  probability  of 
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communications,  latency,  and  communication  relay  capability  as  parameters  that 
influence  network  capabilities.  Noise  factors  such  as  resources  available,  convoys  per 
hour,  and  convoy  mix  cases  examine  aspects  of  network-enabled  capabilities  on  a 
broader  basis. 

The  results  from  this  effort  can  support  or  refute  the  findings  from  the  functional 
needs  analysis  in  the  Joint  Capabilities  Integration  and  Development  System. 
Recommendations  from  this  research  may  also  serve  to  shape  future  capabilities 
requirements.  Most  importantly,  it  will  benefit  Soldiers  who  operate  from  fixed-based 
facilities  in  theaters  of  operation. 

B.  DESIGN  OF  EXPERIMENTS  AND  SIMULATION  RUNS 

This  simulation  study  uses  modeling  and  simulation  and  an  efficient  experimental 
design.  Major  Baez’  study  requires  an  MFML  design  for  this  computer  experiment.  His 
simulation  includes  1  categorical  variable,  5  binary  variables,  and  14  numeric  variables 
(integer  and  continuous)  with  different  numbers  of  value  levels.  Converting  the  actual 
values  of  discrete  variables  onto  a  raw  NOLH,  results  in  an  overall  design  with  poor 
orthogonality  properties.  We  create  a  base  design  using  the  methods  of  Chapter  V  (see 
also  Hernandez,  2008b)  for  one  integer  variable  with  16  levels  and  12  continuous 
variables,  using  a  subset  of  the  columns  from  the  n  =  16,  k  =  15  saturated  design  shown 
in  Table  25.  We  use  an  MFML  design  that  consists  of  this  new  S-NOLH  design, 
combining  them  with  stacking  methods,  and  intelligently  applying  proven  design 
techniques. 
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Table  25.  A  saturated  design  for  n  =  16  and  k  =  15  is  the  foundation  for  creating  a 
base  design  for  an  MFML  design.  Its  correlation  measure  is  0.0471,  with  a 

condition  number  of  1.319. 


kl 

k2 

k3 

k4 

k5 

k6 

k7 

k8 

k9 

klO 

kll 

kl2 

kl3 

kl4 

kl5 

nl 

10 

8 

13 

6 

16 

8 

15 

16 

14 

3 

12 

2 

13 

7 

9 

n2 

13 

12 

4 

8 

13 

15 

3 

4 

9 

4 

3 

3 

4 

2 

13 

n3 

7 

6 

12 

1 

2 

11 

4 

5 

11 

5 

16 

4 

3 

12 

5 

n4 

9 

16 

6 

4 

12 

5 

8 

14 

5 

6 

10 

15 

2 

15 

12 

n5 

11 

10 

8 

5 

15 

1 

7 

2 

15 

16 

7 

9 

9 

9 

2 

n6 

14 

4 

1 

10 

3 

7 

12 

6 

16 

7 

11 

14 

12 

10 

16 

n7 

6 

9 

14 

16 

11 

3 

9 

1 

3 

8 

15 

6 

7 

8 

15 

n8 

1 

13 

16 

7 

5 

10 

5 

7 

12 

9 

1 

10 

16 

11 

14 

n9 

12 

7 

10 

2 

1 

2 

13 

12 

1 

13 

4 

5 

8 

3 

11 

nlO 

16 

14 

11 

13 

6 

12 

2 

13 

6 

11 

14 

13 

14 

4 

4 
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2 

3 

5 

3 

14 

16 

10 

8 

4 

15 

13 

12 

11 

5 

10 
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3 

2 

9 

11 

9 

4 

6 

11 

10 

1 

5 

16 

6 

1 

3 
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15 

1 

15 

12 

10 

14 

11 

9 

7 

10 

2 

11 

5 

16 

7 

nl4 

5 

5 

2 

14 

8 

6 

1 

15 

8 

12 

8 

1 

10 

14 

8 
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4 

15 

7 

15 

4 

13 

16 

10 

13 

14 

9 

8 

1 

6 

6 

nl6 

8 

11 

3 

9 

7 

9 

14 

3 

2 

2 

6 

7 

15 

13 

1 

We  select  any  subset  of  13  columns  and  stack  it,  building  a  32-run  by  13-factor 
base  design  to  match  the  full  factorial  design  ( 25  =  32  runs)  of  five  binary  variables. 
Next,  we  cross  the  resulting  design  with  the  three-level  categorical  variable  to  create  a 
96-run  design.  Randomly  assigning  value  levels  for  the  three-level  integer  variable  to 
match  the  overall  design  creates  an  NOLH.  The  final  MFML  design  contains  96  runs  to 
address  20  factors  in  one  operational  scenario  (see  Appendix  D).  It  retains  the  raw 
design’s  nonorthogonality  measure  of  p  =  0.0471 . 

We  remark  that  this  design  was  not  available  when  Major  Baez  did  the  production 
runs  for  his  thesis.  As  an  alternative  design,  Major  Baez  assigns  values  for  the  binary  and 
categorical  variables  within  the  257-design  point  design,  since  it  can  handle  up  to  29 
variables  (Sanchez,  2006).  The  complete  number  of  design  points  to  examine  three 
different  scenarios  is  771.  The  experiment  involves  10  replications  for  each  scenario  for 
each  of  the  257  design  points,  for  a  total  of  7,710  computational  experiments.  This  is  far 
less  than  the  computational  effort  that  would  be  required  for  10  replications  of  a  257- 
design  point  NOLH  crossed  with  the  full,  two-level  factorial  design,  but  less  efficient 
than  the  MFML  design.  We  note  that  after  rounding  to  the  actual  values,  the  overall 
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nonorthogonality  measure  from  the  catalogued  NOLH  design  is  pmap  =0.0962 — more 

than  twice  that  of  the  MFML,  while  still  requiring  nearly  three  times  the  number  of  runs. 

Baez  (2008)  executes  the  MFML  experiment  with  10  replications  and  96  design 
points  (Hernandez  et  ah,  2008)  for  each  of  the  three  communications  scenarios,  totaling 
2,880  computational  experiments  to  serve  as  a  validation  experiment. 

C.  ANALYSIS 

Analysis  of  the  simulation  output  data  was  conducted  primarily  through  the  JMP 
Statistical  Discovery  Software  (2004),  a  product  of  the  Statistical  Analysis  Software 
institute.  Major  Baez  chose  JMP  (2008)  for  its  data  visualization  features  that  allow  the 
user  to  interactively  investigate  data,  and  to  refine  and  understand  the  analysis  results  in  a 
dynamically-linked  spreadsheet  and  graphical  environment  (Baez,  2008). 

One  analysis  approach  was  construction  partition  trees,  which  we  will  use  to 
illustrate  the  utility  of  our  new  designs.  Partition  trees  recursively  split  the  data  from  the 
simulation  into  homogeneous  subsets  in  accordance  with  the  relationship  between  the 
response  variable  and  the  predictors.  Each  split  considers  all  possible  cuts  or  groupings, 
given  the  current  state  of  the  tree  to  select  a  partition  with  the  largest  likelihood-ratio  Chi- 
square  statistic.  A  threshold  of  5%  change  in  R“  from  the  previous  split  detennines  when 
to  split  the  data  set  (Baez,  2008).  This  method  provides  insights  about  the  most 
significant  factors  influencing  the  response  variables. 

Major  Baez  (2008)  focuses  his  study  on  comparing  the  three  different  network 
structures  and  identifying  those  factors  that  have  the  greatest  impact  on  each  structure’s 
performance,  as  well  as  differences  in  overall  performance  for  the  three  structures.  An 
analysis  of  the  mean  time  in  the  CRSP  is  only  interpreted  as  the  average  time  to  process 
the  specified  number  of  convoys  in  the  context  of  the  scenario,  and  not  necessarily  the 
“typical”  time  for  a  convoy. 

The  following  plots  from  the  larger  design  are  confirmed  with  our  more  efficient 
design.  They  demonstrate  interesting  results  that  confirm  the  complexity  of  the  system 
and  point  to  the  need  for  further  analysis. 
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The  plots  from  JMP  in  Figure  12  show  the  utility  of  LFI  designs  (Baez,  2008). 
The  graphics  present  the  time  in  the  CRSP  as  the  z'th  convoy  appears.  They  reveal 
interesting  results  from  the  different  scenario  settings.  Design  Points  A  and  B  plots  show 
two  scenarios  with  an  arrival  rate  of  one  convoy  per  hour.  Design  Point  A  has  low  traffic 
intensity  and  Design  Point  B  has  somewhat  high  traffic  intensity.  The  plot  for  Design 
Point  A  indicates  that  after  a  certain  point  the  time  in  CRSP  begins  decreasing  slightly, 
but  steadily  until  the  terminating  event.  The  plot  for  Design  Point  B  outlines  a  supply 
system  that  is  unable  to  handle  the  process  for  this  given  combination  of  factor  settings. 
The  time  in  CRSP  continues  to  increase,  indicating  that  very  large  queues  are  building 
and  each  successive  convoy  takes  longer  to  process  than  the  last.  All  three  networks 
exhibit  very  consistent  behavior  (Baez,  2008). 


Figure  12.  Time  in  CRSP  by  convoy  number  per  network  type  when  the  rate  of  convoy 
arrival  is  two  convoys  per  hour  (Baez,  2008). 

The  first  branch  of  the  partition  tree  in  Figure  13  reveals  that  traffic  intensity  and 
IT V- Available  have  the  greatest  influence  on  the  mean  time  in  CRSP.  When  traffic 
intensity  is  near  zero  there  is  little  queuing  in  the  system,  while  traffic  intensity  greater 
than  1.0  results  in  substantial  queuing  in  the  system  (Baez,  2008).  Furthermore,  the  mean 
time  in  CRSP  improves  with  timely  and  reliable  ITV  data,  even  when  traffic  intensity 
nears  1.0.  The  partition  tree  clearly  identifies  that  these  two  factors  have  the  greatest 
influence  on  the  mean  response  and  is  the  framework  for  further  analysis. 
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Figure  13.  Partition  tree  for  mean  time  in  CRSP,  hierarchical  network  structure 
(Baez,  2008) 

Major  Baez  also  applies  multiple  linear  regression  analysis.  The  multiple 
regression  model,  as  the  approximating  function  for  the  true  functional  relationship 
between  the  response  and  regressor  variables,  examines  the  behavior  of  variables  of 
interest  (Montgomery  et  ah,  2001).  Baez  uses  a  stepwise  linear  regression  method  to  fit 
regression  metamodels  for  the  mean  time  in  CRSP  as  a  function  of  main  effects, 
quadratic  effects,  and  two-way  interactions.  The  stepwise  regression  control  in  JMP  also 
helped  identify  the  most  influential  factors. 

The  final  regression  metamodel  is  in  Figure  14.  It  yields  an  R"  of  0.77  and 
contains  two  main  effect  terms  and  one  interaction  term.  Other  models  that  are 
considered  include  additional  terms  such  as  other  main  effects,  interactions,  and  quadratic 
effects.  However,  these  additional  terms  explain  only  1%  more  of  the  variability. 
Therefore,  the  parsimonious  model  is  selected.  The  results  suggest  that  traffic  intensity 

and  ITV- Available  are  ranked  as  the  two  most  influential  factors.  Traffic  intensity  is  the 
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dominant  factor,  as  indicated  by  a  large  |t-ratio|.  These  results  serve  to  reinforce  and 
complement  those  findings  in  the  partition  tree. 


Whole  Model 


Summary  of  Fit 

RSquare 

0.768786 

RSquare  Adj 

0.766044 

Root  Mean  Square  Error 

123.8509 

Mean  of  Response 

204.0588 

Observations  (or  Sum  Wgts) 

257 

Analysis  of  Variance 

Sum  of 


Source 

DF 

Squares 

Mean  Square 

F  Ratio 

Model 

3 

12903598 

4301199 

280.4084 

Error 

253 

3880781 

15339 

Prob  >  F 

C.  Total 

256 

16784379 

<.0001* 

Parameter  Estimates 


Term 

Estimate 

Std  Error 

t  Ratio 

Prok»|t| 

Intercept 

201.08553 

16.68101 

12.05 

<.0001* 

ITV_Available 

-311.4838 

26.82984 

-11.61 

<.0001* 

traffic  intensity 

389.47486 

15.0165 

25.94 

<.0001* 

(ITV_Available-0.50265)*(traffic  intensity-0.41024) 

-248.2801 

48.32956 

-5.14 

<.0001* 

Figure  14.  Regression  metamodel  for  mean  time  in  CRSP,  hierarchical 
network  structure  (Baez,  2008) 


The  findings  from  the  96-run  MFML  design  are  similar  and  reinforce  the  larger 
design.  In  validating  the  larger  design,  the  MFML  proves  itself  as  a  viable  exploratory 
tool  that  can  save  the  experimenter  time  and  money.  In  this  case,  the  MFML  would  have 
saved  the  experimenter  two-thirds  of  the  resources  expended  for  the  full  design. 


D.  WIDESPREAD  UTILITY  OF  NEW  NOLH  DESIGNS 

In  addition  to  Major  Baez,  over  a  dozen  thesis  students  have  used  some  form  of 
FRLH,  NOLH,  S-NOLH,  and  MFML  designs  to  complete  their  studies. 
Captain  Joshua  Ang  (2006),  Singapore  Anny,  used  FRLH  designs  to  understand  their 
utility  and  identify  areas  to  improve  the  library  of  OLH  and  NOLH  designs. 
Major  Chris  Michel  (2006),  United  States  Marine  Corps,  used  earlier  versions  of  the 
FRLH  to  find  more  efficient  designs  to  evaluate  the  Marine  Corp’s  Artillery  Triad. 
Major  Jon  Alt  (2006),  United  States  Army,  explored  tactics,  techniques,  and  procedures 
that  troops  in  small  combat  units  could  apply  in  response  to  the  challenges  of  future 
combat  systems  and  their  employment.  Major  Earl  Richardson  (2006),  United  States 
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Marine  Corps,  employed  NOLHs  to  study  sensitivities  in  the  Infantry  Warrior 
Simulation  (IWARS). 
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VII.  CONCLUSIONS  AND  RECOMMENDATIONS  FOR 

FUTURE  STUDY 


Immense  computing  power  enables  the  simulation  of  increasingly  complex 
problems,  thereby  allowing  analysts  to  greatly  assist  decision  makers  in  all  sectors  of 
society.  Studies  in  human  behavior  and  biomimetics  often  use  computer  simulations  that 
rely  on  efficient  designs  of  experiments,  adopting  techniques  that  allow  analysts  to 
“systematically  examine  a  broader  range  of  possible  innovations  .  .  (Booker,  2005). 
The  United  States  Marine  Corps  Warfighting  Lab  uses  agent-based  models  in  computer 
experiments  to  understand  nontraditional  combat  through  the  aggregate  study  of  large 
numbers  of  nonlinear  actors  (Ilachinski,  1997). 

In  particular,  Department  of  Defense  faces  many  of  the  new  challenges  that 
Brown’s  Grave  New  World  (2003)  uncovers.  The  impact  of  technology  on  global 
security,  the  importance  of  nonmilitary  factors  in  military  affairs,  and  the  role  that 
transnational  actors  play  in  disrupting  nation-states  are  only  the  beginning  in  a  growing 
list  of  variables  that  decision  makers  must  consider — often  making  physical 
experimentation  infeasible. 

Analysts  frequently  turn  to  computational  experimentation,  such  as  computer 
simulation,  to  address  the  hundreds  of  input  variables  for  the  system(s)  of  interest.  In  so 
doing,  analysts  search  for  schemas  that  gain  the  most  information  from  experiments. 
Computational  and  monetary  costs  for  one  experimental  run  push  analysts  to  find 
efficient  designs,  especially  when  it  is  ill-advised  to  make  assumptions  about  variables  of 
interest.  Earlier  designs  of  experiments  may  not  always  be  suited  for  computer 
experimentation.  The  recent  surge  of  methods  to  construct  efficient  designs,  most 
notably  those  with  orthogonal  properties  (OLH  and  NOLH),  offers  analysts  a  new 
methodology  for  exploring  extremely  complex  problems  (Kleijnen  et  ah,  2005).  The 
principal  shortfalls  of  these  designs  are  their  strict  limitations  in  dimensionality  and  their 
scarcity.  Techniques  that  overcome  these  restrictions  will  make  orthogonal  and  nearly 
orthogonal  designs  an  important  class  of  design  for  exploring  large  models. 
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Our  new  techniques  to  construct  experimental  designs  provide  the  freedom  to 
create  a  variety  of  OLH  and  NOLH  designs  for  many  new  design  dimensions.  The 
flexibility  of  this  methodology  enables  the  construction  of  a  design  with  the  exact 
dimensions  that  an  analyst  requires.  It  replaces  the  awkwardness  of  fitting  value  levels  in 
a  catalogued  design,  by  using  each  value  level  more  than  once  when  a  variable  contains 
fewer  levels  than  runs.  Additionally,  our  method  possesses  the  ability  to  produce  new 
designs  in  a  matter  of  hours. 

Introducing  a  new  family  of  S-NOLHs  contributes  to  the  discipline  of  design  of 
experiments.  The  efficiency  of  examining  the  experimental  space  of  k  variables  from  a 
sample  size,  n  =  k  +  1  in  a  LH  offers  great  potential  for  simulation  experiments.  The 

(k\ 

ability  to  select  any  one  of  subsets  of  columns  or  exchange  columns  between  subsets 

v2y 

in  an  S-NOLH  design  gives  analysts  unprecedented  flexibility. 

This  new  construction  methodology  fills  much  of  the  void  in  the  OLH  and  NOLH 

library.  Catalogued  designs  from  Cioppa’s  (2002)  original  work  explore  k  =  m  + 

factors.  The  gap  between  any  two  sequentially-sized  designs  is  m  -  1  factors  and  grows 
as  m  increases.  S-NOLH  designs  fill  gaps  between  the  number  of  factors,  as  well  as  the 
combinatorial  voids  created  when  also  considering  the  sample  size,  n  =  2'n  +  1.  For 
instance,  a  gap  exists  between  O)1  and  Nff  when  there  is  a  need  to  address  between 

8  and  10  factors  for,  at  most,  17  runs.  A  saturated  N\l  easily  fills  this  gap.  Currently,  no 

OLH  or  NOLH  designs  exist  between  n  =  17  and  n  =  33.  We  build  a  saturated  N™ 

(Appendix  A)  that  does  not  confonn  to  n  or  k  restrictions  that  exist  in  previous  OLH  and 
NOLH  construction  methods.  We  have  catalogued  a  number  of  saturated  designs  that 
address  from  2  to  63  factors,  with  no  more  than  64  runs.  More  designs  are  available  for 
the  asking. 

One  research  area  that  requires  further  study  is  the  perfonnance  of  our  new 

designs  in  the  presence  of  interaction  and  quadratic  tenns.  OLH  and  NOLH  designs  from 

Ye  (1998),  Cioppa  (2002),  and  Ang  (2006)  retain  their  orthogonal  properties  in  the 

presence  of  a  single  quadratic  or  two-factor  interaction  term.  Our  preliminary  findings 
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show  that  RLH  outperforms  OLH  and  NOLH  in  tenns  of  pmap  as  the  number  of  quadratic 

or  two-factor  interaction  terms  increase,  but  this  requires  further  study.  The  impact  of 
higher  order  interactions  in  an  experiment  is  also  worthy  of  examination.  An 
understanding  of  thresholds  related  to  the  number  and  order  of  nonlinear  terms  that  may 
possibly  be  in  an  experiment  will  guide  analysts  for  using  our  designs. 

The  success  of  VMIN  for  building  new  orthogonal  or  nearly  orthogonal  designs  is 
a  significant  step  for  using  optimization  methods  to  construct  experimental  designs.  The 
initial  design  that  VMIN  uses  is  important  for  quickly  finding  a  solution,  or  in  finding  a 
solution  at  all.  Development  of  a  more  comprehensive  heuristic  to  initiate  VMIN  would 
be  a  major  contribution  to  this  area  of  research.  Our  current  approach  is  to  start  with  a 
new  FRLH  when  VMIN  finds  difficulty  in  converging  to  a  solution.  A  better  method 
than  rms  for  selecting  the  next  column  to  optimize  may  also  improve  the  algorithm.  At 
times,  VMIN  remains  on  the  same  column  throughout  the  optimization  routine  because 
one  run  of  VMIN  on  the  targeted  column  shows  no  improvement,  thereby  resulting  in  the 
same  column  with  the  greatest  rms.  A  means  to  move  the  program  to  the  next  column 
should  help  VMIN’s  performance.  A  process  to  automatically  “skip”  to  the  next  column 
after  no  improvements  in  the  selected  column  will  enable  VMIN  to  continue.  This  forced 
movement  can  in  turn  affect  the  optimization  of  the  unimproved  column  when  the  routine 
eventually  returns  to  it.  Currently,  we  manually  stop  the  program  and  target  the  column 
that  has  the  next  greatest  rms.  Although  this  process  has  proven  effective,  a  more 
automated  means  would  be  very  useful. 
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APPENDIX  A.  SATURATED  NOLH  DESIGNS 


Appendix  A  displays  a  number  of  saturated  NOLH  designs  that  we  developed 
during  the  course  of  our  study.  Each  design  is  designated  with  a  symbol  per  Cioppa 
(2002)  and  the  nonorthogonality  measure  for  the  design. 
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K  with  pmap  =  0.0499 
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First  32  columns  of  with  pmap  =  0.0443 


Final  31  columns  of  N™  with  pmap  =0.0443 
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APPENDIX  B.  PAIRWISE  PLOT  OF  FACTORS  IN  N% 


Appendix  B  displays  a  pairwise  plot  of  the  factors  in  N\].  The  dispersion  of 
points  in  each  pairwise  plot  shows  reasonable  space  fdling  properties. 
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APPENDIX  C.  MAJOR  ROGINSKI’S  DESIGN 


This  Appendix  lists  the  MFML  design  matrix  for  Major  Roginski’s  experiment. 
The  design  is  split  into  eight  sections.  Each  section  heading  describes  the  design  points 
and  the  factors  shown  in  the  section.  It  also  includes  the  nonorthogonality  of  the  design. 
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Roginski  MFML  Section  1 :  n  =  1  thru  72,  k  =  1  thru  1 1  with  pmap  =  0.0593  . 
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Roginski  MFML  Section  2:  n  =  73  thru  144,  k  =  1  thru  1 1  with pmap  =  0.0593  . 


FO  Police  |  Traffic  |  Time  OP  |  Panic  |  Num  Agl  |  Num  Ag2  |  Num  Ag3  |  Num  Ag4 
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Roginski  MFML  Section  3:  n  =  1  thru  72,  k  =  12  thru  22  with pmap  =  0.0593  . 
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Roginski  MFML  Section  4:  n  =  73  thru  144,  k  =  12  thru  22  with p  =  0.0593  . 


/  map 


Num  Foll2  Num  Foll3  Num  Traffl  Num  Traff2 
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Roginski  MFML  Section  5:  n  =  1  thru  72,  k  =  23  thru  33  with pmap  =  0.0593  . 


118 


Roginski  MFML  Section  6:  n  =  73  thru  144,  k  =  23  thru  33  with  pmap  =  0.0593  . 


|  NumTraff3  |  NumTraff4  |  NumTraff5  |  NumTraff6  |  NumTraff7  |  NumTraff8  |  NumTraff9  |  NumTrafflO  |  NumTraffll  |  NumTraff12  |  Desire  Civ  | 
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Roginski  MFML  Section  8:  n  =  73  thru  144,  k  =  34  thru  45  with pmap  =  0.0593  . 
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APPENDIX  D 


MAJOR  BAEZ’S  MFML  DESIGN 
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APPENDIX  E.  A  pZ;  PREDICTIVE  MODEL  WITH 
QUADRATIC  TERMS 

The  final  multiple  linear  regression  model  to  predict  p"™p  for  a  given  n  and  k  is  a 
relatively  simple  expression: 

( p'ZP  )£  =  0.0873  +  7.859 -  0. 109C173  - 1 1 .702»"2/3r1/3 . 

It  is  very  tractable  in  its  representation  of  n  and  k.  We  see  that  as  n  increases  and  k 
remains  constant  the  first  term  is  dominant  in  the  expression  and  the  mean  maximum 
absolute  pairwise  correlation  decreases,  as  Owen  (1994)  theorized.  However,  when  k 
also  grows  large  p"™  reduces  less  quickly  because  of  the  values  in  the  second  term. 

These  results  are  reasonable  since  an  increase  in  k  increases  the  number  of  columns, 
thereby  creating  more  possibility  of  a  new  column  permutation  that  may  have  high 
correlation  with  one  of  the  other  columns. 

Analysis  of  this  model  shows  that  it  is  adequate  for  predicting  the  best  from 

G200  trials,  given  n  and  k.  Statistics  for  the  residuals  indicate  that  the  model  is 
acceptable  for  its  ability  to  predict.  The  range  of  residuals,  in  comparison  with  the 
magnitude  of  the  predicted  values,  is  relatively  small. 

Figure  15  displays  curvature,  suggesting  that  inclusion  of  quadratic  terms  may 
provide  an  improvement. 
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Figure  15.  The  plot  for  the  residuals  resulting  from  a  G200  MLR  shows  curvature  that 
indicates  a  need  to  include  more  complex  terms  in  the  regression  model. 

We  explore  the  impact  of  adding  more  complex  terms  in  the  equation  to  help  the 
model’s  performance.  We  extend  the  model  from  Chapter  II  to  include  quadratic  terms 
for  transformed  n  and  transformed  k.  The  result  is  the  following  eight-term  equation  with 
an  R2  =0.9996 . 

( pZl  =  0.03054  +  0.03208  *  r 1/3  -  0. 100806  *  k~m  +13.06842  *  nm  -  68.38078  *  nm 

-  30. 12779  * k~mn~m  +  254.89 1 7  *  /T  1/3»"4/3  + 1 7.93 1 09  *  r2/ V2/3  -  254.839 1  *  r2/ V4/3 

The  residual  plot  (Figure  16)  for  this  new  model  continues  to  show  curvature,  but 
the  sizes  of  the  residuals  are  a  full  order  of  magnitude  less  that  those  from  the  more 
simple  equation. 
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Figure  16.  The  residual  plot  for  the  new  multiple  linear  regression  model  with  quadratic 
terms  still  show  curvature,  but  the  sizes  of  the  residuals  are  much  smaller 
than  from  previous  models. 

The  trade-off  with  this  more  precise  formula  is  an  increase  in  difficulty  in 
explaining  the  impacts  of  n  and  k,  as  well  as  a  reduction  in  the  ease  in  which  an 
experimenter  can  use  it.  Although  the  residual  analysis  of  this  multiple  linear  regression 
model  reveals  that  there  is  some  curvature  in  the  relationship,  the  amount  of  error  in  the 
predicted  values  is  relatively  small.  Since  the  majority  of  experimenters  would  use  this 
formula  mostly  as  a  guide  to  find  the  appropriate  design  dimensions  and  not  itself, 
the  simpler  model  serves  our  purposes. 
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